Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Are there any plans to include Tachyon (Alluxio) into CDH?

avatar
Expert Contributor

I recently learned that Spark 2.0 will include Structured Streaming that involves unlimited/forever DataFrames/DataSets. This will store the data in memory and spill to disk using Tachyon, which can store data in any number of different, underlying systems. The benefit is that we are further abstracted from the actual details of specifying how data is stored. It's already handled for us. This leaves us with just focusing on the data structures and processing. Data formats, folders, etc. are no longer a concern.

 

If anyone has information, please let me know.

 

Thanks,

Ben

Who agreed with this topic