Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

is the spark window function operates on the whole time range each trigger or just the incremental data?

is the spark window function operates on the whole time range each trigger or just the incremental data?

hi,

i looked at the tutorials and over the web , but i am not sure if :

by operating the window function in structured streaming , like :

inputDF.groupBy($"action", window($"time", "1 hour")).count()
       .writeStream.format("jdbc")
       .start("jdbc:mysql//…")

it means that the streamed data is grouped by 1 hour , but does it means that if i save the state of the query (with checkpoint() ?) , then the aggregation is calculated only on the added data and not on the whole windowed data , using the state for this ?

if the state doesn't help here - how can i save execution time by taking only the delta data into consideration for aggregation ongoing functions ?

Don't have an account?
Coming from Hortonworks? Activate your account here