Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can I share a Data Frame between two spark submit jobs?

Can I share a Data Frame between two spark submit jobs?

New Contributor

Hi, 

 

Can I share a Data Frame between among two jobs? is it possible to reference a data frame which is created by other spark submit job?

2 REPLIES 2

Re: Can I share a Data Frame between two spark submit jobs?

Expert Contributor

Object sharing between different spark-submit jobs is not there currently. However, it immensely helps if we know your use-case in as much detail as possible and the problem you are trying to solve with sharing dataframes.

 

My understanding is if the data changes infrequently and caching is a must have, you can use HDFS caching. If the data changes often i.e. records will constantly be updated and the data has to be shared among many different applications: use Kudu. Kudu already has basic caching capabilities where frequently read subsets of data are automatically cached.

 

There was a previous thread awhile back around the same lines and some options that you could explore (though unsupported) are Spark-JobServer or Tachyon. However, I have not used them and can't comment beyond the references. 

Re: Can I share a Data Frame between two spark submit jobs?

Explorer

Usually people use HDFS, S3 (or Kudu) or you can use Alluxio (Tachyon) as an off-heap storage, which is faster and more scalable.