Reply
New Contributor
Posts: 1
Registered: ‎11-21-2017

Can I share a Data Frame between two spark submit jobs?

Hi, 

 

Can I share a Data Frame between among two jobs? is it possible to reference a data frame which is created by other spark submit job?

Cloudera Employee
Posts: 31
Registered: ‎11-16-2015

Re: Can I share a Data Frame between two spark submit jobs?

Object sharing between different spark-submit jobs is not there currently. However, it immensely helps if we know your use-case in as much detail as possible and the problem you are trying to solve with sharing dataframes.

 

My understanding is if the data changes infrequently and caching is a must have, you can use HDFS caching. If the data changes often i.e. records will constantly be updated and the data has to be shared among many different applications: use Kudu. Kudu already has basic caching capabilities where frequently read subsets of data are automatically cached.

 

There was a previous thread awhile back around the same lines and some options that you could explore (though unsupported) are Spark-JobServer or Tachyon. However, I have not used them and can't comment beyond the references. 

Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Can I share a Data Frame between two spark submit jobs?

Usually people use HDFS, S3 (or Kudu) or you can use Alluxio (Tachyon) as an off-heap storage, which is faster and more scalable.

Announcements