Support Questions

Find answers, ask questions, and share your expertise

Can I share a Data Frame between two spark submit jobs?

New Contributor



Can I share a Data Frame between among two jobs? is it possible to reference a data frame which is created by other spark submit job?


Expert Contributor

Object sharing between different spark-submit jobs is not there currently. However, it immensely helps if we know your use-case in as much detail as possible and the problem you are trying to solve with sharing dataframes.


My understanding is if the data changes infrequently and caching is a must have, you can use HDFS caching. If the data changes often i.e. records will constantly be updated and the data has to be shared among many different applications: use Kudu. Kudu already has basic caching capabilities where frequently read subsets of data are automatically cached.


There was a previous thread awhile back around the same lines and some options that you could explore (though unsupported) are Spark-JobServer or Tachyon. However, I have not used them and can't comment beyond the references. 


Usually people use HDFS, S3 (or Kudu) or you can use Alluxio (Tachyon) as an off-heap storage, which is faster and more scalable.