- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
timestamp column changes of format in a csv file spark
- Labels:
-
Apache Spark
Created on ‎03-02-2017 09:14 AM - edited ‎08-19-2019 02:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys i am trying to save a dataframe to a csv file , that contains a timestamp. The problem that this column changes of format one written in the csv file .when showing via df.show i got a correct format
when i check the csv file i got this format
i also tried some think like this ,and still got the same problem
finalresult.coalesce(1).write.option("header",true).option("inferSchema","true").option("dateFormat","yyyy-MM-dd HH:mm:ss").csv("C:/mydata.csv")
val spark =SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()val df = spark.read.option("header",true).option("inferSchema","true").csv("C:/Users/mhattabi/Desktop/dataTest2.csv")//val df = spark.read.option("header",true).option("inferSchema", "true").csv("C:\dataSet.csv\datasetTest.csv")//convert all column to numeric value in order to apply aggregation function df.columns.map { c =>df.withColumn(c, col(c).cast("int"))}//add a new column inluding the new timestamp columnval result2=df.withColumn("new_time",((unix_timestamp(col("time"))/300).cast("long")*300).cast("timestamp")).drop("time")val finalresult=result2.groupBy("new_time").agg(result2.drop("new_time").columns.map((_ ->"mean")).toMap).sort("new_time")//agg(avg(all columns..) finalresult.coalesce(1).write.option("header",true).option("inferSchema","true").csv("C:/mydata.csv")
Created ‎03-02-2017 04:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A quick hack would be to use scala "substring"
http://alvinalexander.com/scala/scala-string-examples-collection-cheat-sheet
So what you can do is write a UDF and run the "new_time" column through it and grab upto time stamp you want. For example if you want just "yyyy-MM-dd HH:MM" as seen when you run the "df.show", your sub string code will be
new_time.substring(0,15)
which will yield "2015-12-06 12:40"
pseudo code
def getDateTimeSplit = udf((new_time:String) => { val s = new_time.substring(0,15) return s })
