Created 07-12-2016 02:31 AM
I'm loading in a DataFrame with a timestamp column and I want to extract the month and year from values in that column.
When specifying in the schema a field as TimestampType, I found that only text in the form of "yyyy-mm-dd hh:mm:ss" works without giving an error. Is there a way of specifying the format when reading in a csv file, like "mm/dd/yyyy hh:mm:ss"?
If not and we have to specify the field as StringType, is there a way of converting the format my time is in to JDBC format? Would this be inefficient compared to just substringing the timestamp as a StringType?
Thank you!
Created 07-12-2016 04:09 AM
@jestin ma found a similar solution here.
"You can use date processing functions which have been introduced in Spark 1.5. Assuming you have following data:
val df =Seq((1L,"05/26/2016 01:01:01"),(2L,"#$@#@#")).toDF("id","dts")
You can use unix_timestamp
to parse strings and cast it to timestamp
import org.apache.spark.sql.functions.unix_timestamp
val ts = unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("timestamp")
df.withColumn("ts", ts).show(2,false)// +---+-------------------+---------------------+// |id |dts |ts |// +---+-------------------+---------------------+// |1 |05/26/2016 01:01:01|2016-05-26 01:01:01.0|// |2 |#$@#@# |null |// +---+-------------------+---------------------+
As you can see it covers both parsing and error handling.
In Spark < 1.6 you'll have to use use something like this:
unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("double").cast("timestamp")
or
(unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss")*1000).cast("timestamp")
due to SPARK-11724.
In Spark < 1.5 you should be able to use these with expr
and HiveContext
."
Created 07-12-2016 04:09 AM
@jestin ma found a similar solution here.
"You can use date processing functions which have been introduced in Spark 1.5. Assuming you have following data:
val df =Seq((1L,"05/26/2016 01:01:01"),(2L,"#$@#@#")).toDF("id","dts")
You can use unix_timestamp
to parse strings and cast it to timestamp
import org.apache.spark.sql.functions.unix_timestamp
val ts = unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("timestamp")
df.withColumn("ts", ts).show(2,false)// +---+-------------------+---------------------+// |id |dts |ts |// +---+-------------------+---------------------+// |1 |05/26/2016 01:01:01|2016-05-26 01:01:01.0|// |2 |#$@#@# |null |// +---+-------------------+---------------------+
As you can see it covers both parsing and error handling.
In Spark < 1.6 you'll have to use use something like this:
unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("double").cast("timestamp")
or
(unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss")*1000).cast("timestamp")
due to SPARK-11724.
In Spark < 1.5 you should be able to use these with expr
and HiveContext
."
Created 07-12-2016 05:40 PM
I didn't even see this on stackoverflow! Thank you.
Created 07-12-2016 08:36 PM
Please note that there are also convenience functions provided in pyspark.sql.functions, such as dayofmonth:
pyspark.sql.functions.dayofmonth(col)
Extract the day of the month of a given date as integer.
Example:
>>> df = sqlContext.createDataFrame([('2015-04-08',)], ['a']) >>> df.select(dayofmonth('a').alias('day')).collect() [Row(day=8)]
Created 05-28-2018 08:58 AM
Hello people !
Could anyone help me out!
my datset contains a timestamp field and I need to extract the year, the month, the day and the hour from it.
I taped these lines !
training.createOrReplaceTempView("df") spark.udf.register("getCurrentHour", getCurrentHour _) val hour = spark.sql("select getCurrentHour(payload_MeterReading_IntervalBlock_IReading_endTime) as hour from df") spark.udf.register("assignTod", assignTod _) timestamps_df.createOrReplaceTempView("timestamps") val tod = spark.sql("select assignTod(hour) as tod from timestamps")
the problem is am not good in scala so I couldn't figure out the best solution !
the two fonctions i used to extract hour and assign it as day part
def assignToDay(hr : Integer) : String = { if(hr >= 7 && hr < 12){ return "morning" }else if ( hr >= 12 && hr < 14) { return "lunch" } else if ( hr >= 14 && hr < 18) { return "afternoon" } else if ( hr >= 18 && hr.<(23)) { return "evening" } else if ( hr >= 23 && hr <= 24) { return "night" } else if ( hr < 7) { return "night" } else { return "error" } }
def getCurrentHour(dateStr: String) : Integer = { var currentHour = 0 try { val date = new Date(dateStr.toLong) return int2Integer(date.getHours) } catch { case _ => return currentHour } return 1 }