I'm loading in a DataFrame with a timestamp column and I want to extract the month and year from values in that column.
When specifying in the schema a field as TimestampType, I found that only text in the form of "yyyy-mm-dd hh:mm:ss" works without giving an error. Is there a way of specifying the format when reading in a csv file, like "mm/dd/yyyy hh:mm:ss"?
If not and we have to specify the field as StringType, is there a way of converting the format my time is in to JDBC format? Would this be inefficient compared to just substringing the timestamp as a StringType?
my datset contains a timestamp field and I need to extract the year, the month, the day and the hour from it.
I taped these lines !
spark.udf.register("getCurrentHour", getCurrentHour _)
val hour = spark.sql("select getCurrentHour(payload_MeterReading_IntervalBlock_IReading_endTime) as hour from df")
spark.udf.register("assignTod", assignTod _)
val tod = spark.sql("select assignTod(hour) as tod from timestamps")
the problem is am not good in scala so I couldn't figure out the best solution !
the two fonctions i used to extract hour and assign it as day part