28787
DISCUSSIONS
102102
MEMBERS
3161
ARTICLES
Created 05-16-2016 10:07 AM
I recently learned that Spark 2.0 will include Structured Streaming that involves unlimited/forever DataFrames/DataSets. This will store the data in memory and spill to disk using Tachyon, which can store data in any number of different, underlying systems. The benefit is that we are further abstracted from the actual details of specifying how data is stored. It's already handled for us. This leaves us with just focusing on the data structures and processing. Data formats, folders, etc. are no longer a concern.
If anyone has information, please let me know.
Thanks,
Ben