- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
convert milliseconds data frame column into Unixtimestamp using sprak scala ??
- Labels:
-
Apache Spark
Created 11-09-2017 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
event csv file looks like this
|display_id| uuid|document_id|timestamp|platformgeo_location|
1|cb8c55702adb93| 379743| 61| 3| |
2|79a85fa78311b9| 1794259| 81| 2| |
3|822932ce3d8757| 1179111| 182| 2| |
4|85281d0a49f7ac| 1777797| 234| 2| |
This my code spark scala code
- import org.joda.time._
- case class flight(display_id: Int ,uuid:String, document_id :Int, timestamp:String, platformgeo_location:String)
- valstreamdf=sc.textFile("/FileStore/tables/y6ak4fzq1504260076447/events.csv").map(_.split(",")).map(x=>flight(x(0).toInt,x(1).toString,x(2).toInt,x(3).toString,x(4).toString)).toDF()
- streamdf.show()
- streamdf.registerTempTable("event1")
- val result = sqlContext.sql("select * from event1 limit 10")
- val addP = (p: Int) => udf( (x: Int) => x + p ) val stamp = streamdf.withColumn("timestamp", addP(1465876799)($"timestamp")).toDF()
- stamp.show() stamp.registerTempTable("stamp")
- new org.joda.time.DateTime(1465876799*1000)
- val df = sqlContext.sql("select from_unixtime(timestamp,'YYYY-MM-dd') as 'ts' from stamp")
- when am exceuting last command which is in bold
val df type mismatch error am getting
how to resolve this problem please help me out
Thanks in advance
swathi.T
Created 11-10-2017 01:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you are using timestamp field data type as string, can you cast that to Bigint or int as per your requirements then from_unixtime will work.
Possible Outputs for your timestamp value 1465876799, you can check them in hive (or) beeline shell.
hive> select from_unixtime(1465876799, 'yyyy-MM-dd');
2016-06-13
hive> select from_unixtime(CAST(1465876799000 as int), 'yyyy-MM-dd');
2010-12-21
hive> select from_unixtime(CAST(1465876799000 as bigint), 'yyyy-MM-dd'); 48421-10-14
select from_unixtime(CAST(1465876799000/1000 as BIGINT), 'yyyy-MM-dd');Error:-
2016-06-13
hive> select from_unixtime(CAST(1465876799000 as string), 'yyyy-MM-dd'); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''yyyy-MM-dd'': No matching method for class org.apache.hadoop.hive.ql.udf.UDFFromUnixTime with (string, string). Possible choices: _FUNC_(bigint) _FUNC_(bigint, string) _FUNC_(int) _FUNC_(int, string)
As you can view above i did cast 1465876799000 as string but it is giving error with possible choices are bigint,int.
Possible Query for your case:-
val df = sqlContext.sql("select from_unixtime(cast(timestamp as bigint),'YYYY-MM-dd') as 'ts' from stamp")
(or)
change data type in case class
case class flight(display_id: Int ,uuid:String, document_id :Int, timestamp:BigInt, platformgeo_location:String)val df = sqlContext.sql("select from_unixtime(timestamp,'YYYY-MM-dd') as 'ts' from stamp")
I have mentioned all the possible outputs above by testing them in hive shell by using datatypes as int,bigint.
You can pick which is best fit for your case.
