Support Questions
Find answers, ask questions, and share your expertise

aggregating data based on timestamp


Hi guys

i am working with scala and spark , i would like what is the best api or the best way to aggregate data based on interval of timestamps ,knowing that my data includes column timestamp that are taken every second , i would like to get a new dataframe based on aggregation and interval .Example my timestamp column is taken every second , i would like to know how would be my data if the timestamp was taken every minute ?

val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate() val df ="header",true).csv("C:/Users/mhattabi/Desktop/Clean _data/Mud_Pumps _Cleaned/Set_1_Mud_Pumps_Merged.csv") df("DateTime").cast("timestamp")

Thanks any help would

Any help would be appreciated



you want to do group by timestamp .Else you want to pick the latest timestamp ?

Group by timestamp/60 to aggregate by minute, timestamp/3600 to aggregat by hour, and so on.