I recently learned that Spark 2.0 will include Structured Streaming that involves unlimited/forever DataFrames/DataSets. This will store the data in memory and spill to disk using Tachyon, which can store data in any number of different, underlying systems. The benefit is that we are further abstracted from the actual details of specifying how data is stored. It's already handled for us. This leaves us with just focusing on the data structures and processing. Data formats, folders, etc. are no longer a concern.