Created on 02-27-2017 03:37 PM - edited 08-19-2019 02:55 AM
Hi guys i am trying this source code
val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate() val df = spark.read.option("header",true).csv("C:/Users/mhattabi/Desktop/Clean _data/Mud_Pumps _Cleaned/Set_1_Mud_Pumps_Merged.csv") df("DateTime").cast("timestamp") df("ADCH_Mud Pumps.db.MudPump.2.On.value").cast("integer") val result=df.select( col("*"), date_format(df("DateTime"),"yyyy-MM-dd hh:mm").alias("DateTime")).groupBy(df("DateTime")) .agg(avg(df("ADCH_Mud Pumps.db.MudPump.2.On.value"))) result.show(5)
It says what sould i do for the attribute it is really existing in my data set,Thanks
Created 02-27-2017 05:09 PM
In your dataset, do you really "." in your column name in csv file? Is it possible that you can cleanse your data (by removing "." from column name) in your csv file before getting to this step?
Created 02-28-2017 07:30 AM
Hi ,
yes there is a "." in the column name , can this cause a problem in such a operation ? Thanks
Created 03-03-2017 05:26 AM
yes. Last I knew, you cannot have "." in your column name. This is still unresolved. Please see following link.
https://issues.apache.org/jira/browse/SPARK-5632 --> this says it was resolved in 1.4 but its actually not. It points to the following jira and it is still unresolved.
Created 02-27-2017 05:16 PM
can you also show a sample dataset of the csv file. Like 2-3 rows of the csv file so it can be tested.