<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: TimestampType format for Spark DataFrames in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123927#M34464</link>
    <description>&lt;P&gt;Please note that there are also convenience functions provided in pyspark.sql.functions, such as dayofmonth:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;pyspark.sql.functions.dayofmonth(col) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Extract the day of the month of a given date as integer.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;PRE&gt;&amp;gt;&amp;gt;&amp;gt; df = sqlContext.createDataFrame([('2015-04-08',)], ['a'])
&amp;gt;&amp;gt;&amp;gt; df.select(dayofmonth('a').alias('day')).collect()
[Row(day=8)]
&lt;/PRE&gt;</description>
    <pubDate>Wed, 13 Jul 2016 03:36:35 GMT</pubDate>
    <dc:creator>phargis</dc:creator>
    <dc:date>2016-07-13T03:36:35Z</dc:date>
    <item>
      <title>TimestampType format for Spark DataFrames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123924#M34461</link>
      <description>&lt;P&gt;I'm loading in a DataFrame with a timestamp column and I want to extract the month and year from values in that column.&lt;/P&gt;&lt;P&gt;When specifying in the schema a field as TimestampType, I found that only text in the form of "yyyy-mm-dd hh:mm:ss" works without giving an error. Is there a way of specifying the format when reading in a csv file, like "mm/dd/yyyy hh:mm:ss"?&lt;/P&gt;&lt;P&gt;If not and we have to specify the field as StringType, is there a way of converting the format my time is in to JDBC format? Would this be inefficient compared to just substringing the timestamp as a StringType?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jul 2016 09:31:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123924#M34461</guid>
      <dc:creator>jestinm</dc:creator>
      <dc:date>2016-07-12T09:31:42Z</dc:date>
    </item>
    <item>
      <title>Re: TimestampType format for Spark DataFrames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123925#M34462</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/11524/jestinm.html" nodeid="11524"&gt;@jestin ma&lt;/A&gt; found a similar solution &lt;A href="http://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;"You can use date processing functions which have been introduced in Spark 1.5. Assuming you have following data:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;val df =Seq((1L,"05/26/2016 01:01:01"),(2L,"#$@#@#")).toDF("id","dts")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;You can use &lt;CODE&gt;unix_timestamp&lt;/CODE&gt; to parse strings and cast it to timestamp&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import org.apache.spark.sql.functions.unix_timestamp

val ts = unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("timestamp")

df.withColumn("ts", ts).show(2,false)// +---+-------------------+---------------------+// |id |dts                |ts                   |// +---+-------------------+---------------------+// |1  |05/26/2016 01:01:01|2016-05-26 01:01:01.0|// |2  |#$@#@#             |null                 |// +---+-------------------+---------------------+&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As you can see it covers both parsing and error handling.&lt;/P&gt;&lt;P&gt;In Spark &amp;lt; 1.6 you'll have to use use something like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss").cast("double").cast("timestamp")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;or&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;(unix_timestamp($"dts","MM/dd/yyyy HH:mm:ss")*1000).cast("timestamp")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;due to &lt;A href="https://issues.apache.org/jira/browse/SPARK-11724"&gt;SPARK-11724&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;In Spark &amp;lt; 1.5 you should be able to use these with &lt;CODE&gt;expr&lt;/CODE&gt; and &lt;CODE&gt;HiveContext&lt;/CODE&gt;."&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jul 2016 11:09:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123925#M34462</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-07-12T11:09:09Z</dc:date>
    </item>
    <item>
      <title>Re: TimestampType format for Spark DataFrames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123926#M34463</link>
      <description>&lt;P&gt;I didn't even see this on stackoverflow! Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2016 00:40:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123926#M34463</guid>
      <dc:creator>jestinm</dc:creator>
      <dc:date>2016-07-13T00:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: TimestampType format for Spark DataFrames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123927#M34464</link>
      <description>&lt;P&gt;Please note that there are also convenience functions provided in pyspark.sql.functions, such as dayofmonth:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;pyspark.sql.functions.dayofmonth(col) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Extract the day of the month of a given date as integer.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;PRE&gt;&amp;gt;&amp;gt;&amp;gt; df = sqlContext.createDataFrame([('2015-04-08',)], ['a'])
&amp;gt;&amp;gt;&amp;gt; df.select(dayofmonth('a').alias('day')).collect()
[Row(day=8)]
&lt;/PRE&gt;</description>
      <pubDate>Wed, 13 Jul 2016 03:36:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123927#M34464</guid>
      <dc:creator>phargis</dc:creator>
      <dc:date>2016-07-13T03:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: TimestampType format for Spark DataFrames</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123928#M34465</link>
      <description>&lt;P&gt;Hello people !&lt;/P&gt;&lt;P&gt;Could anyone help me out!&lt;/P&gt;&lt;P&gt;my datset contains a timestamp field and I need to extract the year, the month, the day and the hour from it.&lt;/P&gt;&lt;P&gt;I taped these lines !&lt;/P&gt;&lt;PRE&gt;training.createOrReplaceTempView("df")
spark.udf.register("getCurrentHour", getCurrentHour _)
val hour = spark.sql("select getCurrentHour(payload_MeterReading_IntervalBlock_IReading_endTime) as hour from df")

spark.udf.register("assignTod", assignTod _)
timestamps_df.createOrReplaceTempView("timestamps")
val tod = spark.sql("select assignTod(hour) as tod from timestamps")

&lt;/PRE&gt;&lt;P&gt;the problem is am not good in scala so I couldn't figure out the best solution !&lt;/P&gt;&lt;P&gt;the two fonctions i used to extract hour and assign it as day part&lt;/P&gt;&lt;PRE&gt;def assignToDay(hr : Integer) : String = {
  if(hr &amp;gt;= 7 &amp;amp;&amp;amp; hr &amp;lt; 12){
    return "morning"
  }else if ( hr &amp;gt;= 12 &amp;amp;&amp;amp; hr &amp;lt; 14) {
    return "lunch"
  } else if ( hr &amp;gt;= 14 &amp;amp;&amp;amp; hr &amp;lt; 18) {
    return "afternoon"
  } else if ( hr &amp;gt;= 18 &amp;amp;&amp;amp; hr.&amp;lt;(23)) {
    return "evening"
  } else if ( hr &amp;gt;= 23 &amp;amp;&amp;amp; hr &amp;lt;= 24) {
    return "night"
  } else if (  hr &amp;lt; 7) {
    return "night"
  } else {
    return "error"
  }
}&lt;/PRE&gt;
&lt;PRE&gt;def getCurrentHour(dateStr: String) : Integer = {
  var currentHour = 0
  try {
    val date = new Date(dateStr.toLong)
    return int2Integer(date.getHours)
  } catch {
    case _ =&amp;gt; return currentHour
  }
  return 1
}&lt;/PRE&gt;</description>
      <pubDate>Mon, 28 May 2018 15:58:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/TimestampType-format-for-Spark-DataFrames/m-p/123928#M34465</guid>
      <dc:creator>chaimajandoubi5</dc:creator>
      <dc:date>2018-05-28T15:58:16Z</dc:date>
    </item>
  </channel>
</rss>

