<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Python generated parquet timestamp error in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Python-generated-parquet-timestamp-error/m-p/89846#M12249</link>
    <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I can give a quick answer for Impala: reading int64 Parquet timestamps is implemented, but it is a quite new feature, released in CDH 6.2.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;The more widely supported way to store timestamps in Parquet is INT96, so if Pandas can write it that way, then both Hive and Impala will be able to read it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note that there are more than one way to store a timestamp as int64 in Parquet (millisec vs microsec vs nanosec + utc vs local time). The way to interpret the int64 is stored in metadata. As far as I know, it is an ongoing work in Hive to support all possible formats.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you know which int64 format is used, e.g. microseconds utc, then it is also possible to read it as BIGINT and convert it to timestamp in the query, or create a view that does this conversion.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Csaba&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 02 May 2019 16:17:50 GMT</pubDate>
    <dc:creator>CsabaR</dc:creator>
    <dc:date>2019-05-02T16:17:50Z</dc:date>
  </channel>
</rss>

