<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hawq Timestamp datatype in Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147128#M56633</link>
    <description>&lt;P&gt;Great catch &lt;A rel="user" href="https://community.cloudera.com/users/16599/taruntiwari.html" nodeid="16599"&gt;@Tarun Tiwari&lt;/A&gt;!&lt;/P&gt;</description>
    <pubDate>Tue, 14 Mar 2017 23:47:04 GMT</pubDate>
    <dc:creator>kdunn926</dc:creator>
    <dc:date>2017-03-14T23:47:04Z</dc:date>
    <item>
      <title>Hawq Timestamp datatype in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147124#M56629</link>
      <description>&lt;P&gt;Hello:&lt;/P&gt;&lt;P&gt;I have a table in Hawq which is stored as Parquet. The problem is reading the parquet file in Spark returns timestamp columns as Long. The table ddl is:&lt;/P&gt;&lt;P&gt;CREATE TABLE test ( location integer, year integer, week integer, &lt;STRONG&gt;startdateofweek timestamp&lt;/STRONG&gt; without time zone, &lt;STRONG&gt;enddateofweek timestamp&lt;/STRONG&gt; without time zone ) WITH (APPENDONLY=true, ORIENTATION=parquet, COMPRESSTYPE=snappy, OIDS=FALSE ) DISTRIBUTED RANDOMLY;&lt;/P&gt;&lt;P&gt;Note the columns startdateofweek and enddateofweek. These are created as timestamp type. Next I read this table in Spark using the parquet file location rather than connecting to Hawq using a JDBC connection.&lt;/P&gt;&lt;P&gt;val path = "/hawq_default/16385/312220/479532/*"&lt;/P&gt;&lt;P&gt;val df = spark.read.parquet(path) &lt;/P&gt;&lt;P&gt;df.printSchema &lt;/P&gt;&lt;P&gt;df.show(2)&lt;/P&gt;&lt;P&gt;This returns the timestamp columns as long rather than timestamp.&lt;/P&gt;&lt;P&gt;|-- location: integer (nullable = true) &lt;/P&gt;&lt;P&gt;|-- year: integer (nullable = true) &lt;/P&gt;&lt;P&gt;|-- week: integer (nullable = true) &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;|-- startdateofweek: long (nullable = true) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;|-- enddateofweek: long (nullable = true) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;+--------+----+----+---------------+---------------+&lt;/P&gt;&lt;P&gt;|location|year|week|startdateofweek| enddateofweek|&lt;/P&gt;&lt;P&gt;+--------+----+----+---------------+---------------+ &lt;/P&gt;&lt;P&gt;| 2290|2015| 19|&lt;STRONG&gt;484012800000000|484531200000000&lt;/STRONG&gt;|&lt;/P&gt;&lt;P&gt;| 2290|2015| 19|484012800000000|484531200000000| &lt;/P&gt;&lt;P&gt;I have tried many different things to convert this long to timestamp. But nothing works. Some of the things I have tried are:&lt;/P&gt;&lt;P&gt;1) Specifying the schema for dataset using a case class. Still returns as long.&lt;/P&gt;&lt;P&gt;case class myLog(location: Integer, year: Integer,week:Integer, startdateofweek:java.sql.Timestamp, enddateofweek:java.sql.Timestamp)&lt;/P&gt;&lt;P&gt;val path = "/hawq_default/16385/312220/479532/*"&lt;/P&gt;&lt;P&gt;val peopleDS = spark.read.parquet(path).as[myLog]&lt;/P&gt;&lt;P&gt;peopleDS.show(2)&lt;/P&gt;&lt;P&gt;2) Use from_unixtime function. This returns bogus dates.&lt;/P&gt;&lt;P&gt;select from_unixtime(startdateofweek/1000) as ts from TEST where location = 2290 and week = 19 and year = 2015 &lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;17307-10-03 20:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;17307-10-03 20:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;&lt;P&gt;Any help is much appreciated in how to deal with this issue.&lt;/P&gt;&lt;P&gt;Thanks much&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2017 05:14:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147124#M56629</guid>
      <dc:creator>Viraj_Thakur</dc:creator>
      <dc:date>2017-03-09T05:14:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hawq Timestamp datatype in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147125#M56630</link>
      <description>&lt;P&gt;This looks to be a precision thing. Postgres 8.2 (which HAWQ is loosely based on) stores timestamps with microsecond precision, whereas the from_unixtime() function expects the number in seconds, which explains why you're a few centuries in the future &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2017 05:23:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147125#M56630</guid>
      <dc:creator>kdunn926</dc:creator>
      <dc:date>2017-03-09T05:23:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hawq Timestamp datatype in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147126#M56631</link>
      <description>&lt;P&gt;Thanks for the response. But converting from microseconds to seconds returns a date but not the right date.&lt;/P&gt;&lt;P&gt;In Spark SQL:&lt;/P&gt;&lt;P&gt;select from_unixtime(startdateofweek/1000000) as ts from TEST where location = 2290 and week = 19 and year = 2015&lt;/P&gt;&lt;P&gt;Returns:&lt;/P&gt;&lt;P&gt;1985-05-03 20:00:00&lt;/P&gt;&lt;P&gt;1985-05-03 20:00:00&lt;/P&gt;&lt;P&gt;In Postgres. Running the query:&lt;/P&gt;&lt;P&gt;select startdateofweek as ts from TEST_PARQUET where location = 2290 and week = 19 and year = 2015&lt;/P&gt;&lt;P&gt;Returns:&lt;/P&gt;&lt;P&gt;2015-05-04 00:00:00&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Thu, 09 Mar 2017 05:32:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147126#M56631</guid>
      <dc:creator>Viraj_Thakur</dc:creator>
      <dc:date>2017-03-09T05:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Hawq Timestamp datatype in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147127#M56632</link>
      <description>&lt;P&gt;Hawq follows postgres 8.4
convention, according to which 2000-01-01 acts as a zero value and timestamp
data is stored as 8-Byte signed integer with microsecond precision. &lt;/P&gt;&lt;P&gt;From the postgres 8.4 documentation &lt;/P&gt;&lt;P&gt;&lt;A href="https://www.postgresql.org/docs/8.4/static/datatype-datetime.html" target="_blank"&gt;https://www.postgresql.org/docs/8.4/static/datatype-datetime.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;“When timestamp values
are stored as eight-byte integers (currently the default), microsecond
precision is available over the full range of values. ... .timestamp values
are stored as seconds before or after midnight 2000-01-01.”&lt;/P&gt;&lt;P&gt;So the Hawq Date has a 30
Years offset against Unix epoch (1970-01-01).&lt;/P&gt;&lt;P&gt;The
below e.g. function changes the startdateofweek from dataframe back to date.&lt;/P&gt;&lt;P&gt;df.select((($"startdateofweek" +
946684800000000L + 14400000000L)/1000000).cast(TimestampType)).show()&lt;/P&gt;&lt;P&gt;946684800000000L
is for 30 Year offset in microseconds. &lt;/P&gt;&lt;P&gt;14400000000L is
for 4 hours EST offset to GMT in microseconds.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Mar 2017 10:25:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147127#M56632</guid>
      <dc:creator>tarun_tiwari</dc:creator>
      <dc:date>2017-03-14T10:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: Hawq Timestamp datatype in Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147128#M56633</link>
      <description>&lt;P&gt;Great catch &lt;A rel="user" href="https://community.cloudera.com/users/16599/taruntiwari.html" nodeid="16599"&gt;@Tarun Tiwari&lt;/A&gt;!&lt;/P&gt;</description>
      <pubDate>Tue, 14 Mar 2017 23:47:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hawq-Timestamp-datatype-in-Spark/m-p/147128#M56633</guid>
      <dc:creator>kdunn926</dc:creator>
      <dc:date>2017-03-14T23:47:04Z</dc:date>
    </item>
  </channel>
</rss>

