Reply
Explorer
Posts: 62
Registered: ‎01-22-2014

Spark windowing on a live stream

Hi,

 

We are trying to execute the windowing concept on a  live stream using Spark Streaming.

 

We have to capture the data of particular window and to analyze with the existing data (history data) and also to do some transformation on the captured data.

 

How we can do the same? How can we cache or save the data of particular window or apply transformations on a particular window ?

 

Take the example of the below statement. In the case of a live streaming of numbers from 1 to 1000000000000,  what will be  value present in the variable windowedWordCounsts after 30 and 60 seconds?

 

I am new to spark streaming, so apprecite some help in understanding this.Thanks!

 

val windowedWordCounts = pairs.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10))
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark windowing on a live stream

Hi Arun, I'd encourage you to read the Spark docs (which seems to be where your example came from) at https://spark.apache.org/docs/latest/streaming-programming-guide.html and try it yourself. It's more efficient than explaining the details again here. windowedWordCounts is a DStream of RDDs containing (word,count) pairs; each RDD is counts within a window.

Highlighted
Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Spark windowing on a live stream

Actually I posted the details here as I was not able to save a particular window data into HDFS.

 

Data was not getting saved per window as expected. So, i thought there might be a misunderstanding.

 

Thanks, will look into it again.

Announcements