Welcome to the Cloudera Community

benassi · ‎05-16-2016

I recently learned that Spark 2.0 will include Structured Streaming that involves unlimited/forever DataFrames/DataSets. This will store the data in memory and spill to disk using Tachyon, which can store data in any number of different, underlying systems. The benefit is that we are further abstracted from the actual details of specifying how data is stored. It's already handled for us. This leaves us with just focusing on the data structures and processing. Data formats, folders, etc. are no longer a concern.

If anyone has information, please let me know.

Thanks,

Ben

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

Are there any plans to include Tachyon (Alluxio) into CDH?