- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
convert milliseconds data frame column into Unixtimestamp using sprak scala ??
- Labels:
-
Apache Spark
Created ‎11-09-2017 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
event csv file looks like this
|display_id| uuid|document_id|timestamp|platformgeo_location|
1|cb8c55702adb93| 379743| 61| 3| |
2|79a85fa78311b9| 1794259| 81| 2| |
3|822932ce3d8757| 1179111| 182| 2| |
4|85281d0a49f7ac| 1777797| 234| 2| |
This my code spark scala code
- import org.joda.time._
- case class flight(display_id: Int ,uuid:String, document_id :Int, timestamp:String, platformgeo_location:String)
- valstreamdf=sc.textFile("/FileStore/tables/y6ak4fzq1504260076447/events.csv").map(_.split(",")).map(x=>flight(x(0).toInt,x(1).toString,x(2).toInt,x(3).toString,x(4).toString)).toDF()
- streamdf.show()
- streamdf.registerTempTable("event1")
- val result = sqlContext.sql("select * from event1 limit 10")
- val addP = (p: Int) => udf( (x: Int) => x + p ) val stamp = streamdf.withColumn("timestamp", addP(1465876799)($"timestamp")).toDF()
- stamp.show() stamp.registerTempTable("stamp")
- new org.joda.time.DateTime(1465876799*1000)
- val df = sqlContext.sql("select from_unixtime(timestamp,'YYYY-MM-dd') as 'ts' from stamp")
- when am exceuting last command which is in bold
val df type mismatch error am getting
how to resolve this problem please help me out
Thanks in advance
swathi.T
Created ‎11-10-2017 01:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you are using timestamp field data type as string, can you cast that to Bigint or int as per your requirements then from_unixtime will work.
Possible Outputs for your timestamp value 1465876799, you can check them in hive (or) beeline shell.
hive> select from_unixtime(1465876799, 'yyyy-MM-dd');
2016-06-13
hive> select from_unixtime(CAST(1465876799000 as int), 'yyyy-MM-dd');
2010-12-21
hive> select from_unixtime(CAST(1465876799000 as bigint), 'yyyy-MM-dd'); 48421-10-14
select from_unixtime(CAST(1465876799000/1000 as BIGINT), 'yyyy-MM-dd');Error:-
2016-06-13
hive> select from_unixtime(CAST(1465876799000 as string), 'yyyy-MM-dd'); FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''yyyy-MM-dd'': No matching method for class org.apache.hadoop.hive.ql.udf.UDFFromUnixTime with (string, string). Possible choices: _FUNC_(bigint) _FUNC_(bigint, string) _FUNC_(int) _FUNC_(int, string)
As you can view above i did cast 1465876799000 as string but it is giving error with possible choices are bigint,int.
Possible Query for your case:-
val df = sqlContext.sql("select from_unixtime(cast(timestamp as bigint),'YYYY-MM-dd') as 'ts' from stamp")
(or)
change data type in case class
case class flight(display_id: Int ,uuid:String, document_id :Int, timestamp:BigInt, platformgeo_location:String)val df = sqlContext.sql("select from_unixtime(timestamp,'YYYY-MM-dd') as 'ts' from stamp")
I have mentioned all the possible outputs above by testing them in hive shell by using datatypes as int,bigint.
You can pick which is best fit for your case.
