In spark streaming, how to load data received in different events into collection. Lets say on event 1, 1000 records are streamed and on event 2 again another 1000 records are streamed. Now at the end of event 2 I want both the event data (1000 + 1000)
I tried accumulable collection in spark to accumulate data streamed in different events. But, it did not work. Please help.
Spark Streaming provides windowed computations, which allow you to apply
transformations over a sliding window of data. Every time the window slides over a source DStream,
the source RDDs that fall within the window are combined and operated upon to produce the
RDDs of the windowed DStream. So in your case you can define the window over event 1 and event 2 and you then have a combined RDD which you can save or work on.
Here is sample command in Scala taken from the Spark Streaming Programming Guide / Windows Operations.
// Reduce last 30 seconds of data, every 10 seconds
val windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10))