Support Questions
Find answers, ask questions, and share your expertise

is the spark window function operates on the whole time range each trigger or just the incremental data?


i looked at the tutorials and over the web , but i am not sure if :

by operating the window function in structured streaming , like :

inputDF.groupBy($"action", window($"time", "1 hour")).count()

it means that the streamed data is grouped by 1 hour , but does it means that if i save the state of the query (with checkpoint() ?) , then the aggregation is calculated only on the added data and not on the whole windowed data , using the state for this ?

if the state doesn't help here - how can i save execution time by taking only the delta data into consideration for aggregation ongoing functions ?