Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark scala split timestamp into years month and hour columns and convert to dataframe with three columns

Highlighted

Spark scala split timestamp into years month and hour columns and convert to dataframe with three columns

New Contributor

Hello

my dataSet contains a field which is a timestamp!

so I nedd to split this field into 3 column and add it to my dataframe so I could work with machine learning Lib (indexer, onehotencoder() )

Am new in scala , I couldn't figure out how i could do it !

training.createOrReplaceTempView("df")
spark.udf.register("getCurrentHour", getCurrentHour _)
val hour = spark.sql("select getCurrentHour(payload_MeterReading_IntervalBlock_IReading_endTime) as hour from df").printSchema()

spark.udf.register("assignTod", assignToDay _)
val toDay = spark.sql("select assignTodDay(hour) as toDay from df")
toDay.show()
def getCurrentHour(dateStr: String) : Integer = {
var currentHour = 0try {
val date = new Date(dateStr.toLong)
return int2Integer(date.getHours)
} catch {
case _ => return currentHour
}
return 1}




def assignToDay(hr : Integer) : String = {
if(hr >= 7 && hr < 12){
return "morning"}else if ( hr >= 12 && hr < 14) {
return "lunch"} else if ( hr >= 14 && hr < 18) {
return "afternoon"} else if ( hr >= 18 && hr.<(23)) {
return "evening"} else if ( hr >= 23 && hr <= 24) {
return "night"} else if ( hr < 7) {
return "night"} else {
return "error"}
}