08-26-2014 02:36 AM
We are trying to execute the windowing concept on a live stream using Spark Streaming.
We have to capture the data of particular window and to analyze with the existing data (history data) and also to do some transformation on the captured data.
How we can do the same? How can we cache or save the data of particular window or apply transformations on a particular window ?
Take the example of the below statement. In the case of a live streaming of numbers from 1 to 1000000000000, what will be value present in the variable windowedWordCounsts after 30 and 60 seconds?
I am new to spark streaming, so apprecite some help in understanding this.Thanks!
val windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10))
08-26-2014 02:40 AM
Hi Arun, I'd encourage you to read the Spark docs (which seems to be where your example came from) at https://spark.apache.org/docs/latest/streaming-programming-guide.html and try it yourself. It's more efficient than explaining the details again here. windowedWordCounts is a DStream of RDDs containing (word,count) pairs; each RDD is counts within a window.
08-26-2014 04:13 AM
Actually I posted the details here as I was not able to save a particular window data into HDFS.
Data was not getting saved per window as expected. So, i thought there might be a misunderstanding.
Thanks, will look into it again.