Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is there a way to get time-based ticks/triggers in a Spark Streaming job?

avatar
 
1 ACCEPTED SOLUTION

avatar
Master Guru

That is one of the things that is more natural in storm. I think your only chance is to set a pretty low base frequency and then either check for the time/trigger event yourself ( in code that gets executed Iike a mappartitions. ) or to use a trigger input ( for example a kafka topic wirh control commands) and join with your main data stream.

the first approach would be in pseudo code

Inputstream.mappartitions{

String command=<load trigger from database hbase whatever...>

Transform your data flow based on command

}

View solution in original post

2 REPLIES 2

avatar
Master Guru

That is one of the things that is more natural in storm. I think your only chance is to set a pretty low base frequency and then either check for the time/trigger event yourself ( in code that gets executed Iike a mappartitions. ) or to use a trigger input ( for example a kafka topic wirh control commands) and join with your main data stream.

the first approach would be in pseudo code

Inputstream.mappartitions{

String command=<load trigger from database hbase whatever...>

Transform your data flow based on command

}

avatar

I ended up creating an additional source upstream that generates "tick" events at my specified interval, then joined the two RDDs. Every interval, the RDD element from the "tick" stream has a non-zero value.