Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

aggregating data based on timestamp

aggregating data based on timestamp

Explorer

Hi guys

i am working with scala and spark , i would like what is the best api or the best way to aggregate data based on interval of timestamps ,knowing that my data includes column timestamp that are taken every second , i would like to get a new dataframe based on aggregation and interval .Example my timestamp column is taken every second , i would like to know how would be my data if the timestamp was taken every minute ?

val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate() val df = spark.read.option("header",true).csv("C:/Users/mhattabi/Desktop/Clean _data/Mud_Pumps _Cleaned/Set_1_Mud_Pumps_Merged.csv") df("DateTime").cast("timestamp")

Thanks any help would

Any help would be appreciated

2 REPLIES 2
Highlighted

Re: aggregating data based on timestamp

Contributor

you want to do group by timestamp .Else you want to pick the latest timestamp ?

Highlighted

Re: aggregating data based on timestamp

Group by timestamp/60 to aggregate by minute, timestamp/3600 to aggregat by hour, and so on.

Don't have an account?
Coming from Hortonworks? Activate your account here