<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Unable to read multiple Parquet files in hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19410#M3087</link>
    <description>&lt;P&gt;We are using ParquetFileWriter to generate Parquet files and want to be able to query this in hive. So in hive we have it setup as an external table that is pointing to HDFS folder where parquet files are located. This all works great.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Next we tried to setup&amp;nbsp;a partitioned tables so we changed writer to generate multiple folders based on the partition key Year. The external table in hive has been updated to have PartitionBy clause. &amp;nbsp;We've also maually added partitions hive using alter table statement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now on querying the table we are getting&amp;nbsp;incorrect results and it appers to be that the first parquet file loaded contents are returned for all the partitions. So if we query for year 2012 we get 5 records and we will get the same for 2013 and 2014. If we re-start hive shell and query directly for 2014 we correct result only for that partition and now subsequent queries all return the same data from 2014 partition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are using CDH 4.6 with Hive 0.10&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Files weith created with following&lt;/P&gt;&lt;P&gt;&amp;lt;dependency&amp;gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;org.apache.hive&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;artifactId&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;hive-&lt;/SPAN&gt;&lt;SPAN&gt;exec&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;artifactId&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;0.10.0&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;com.twitter&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;artifactId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;parquet&lt;/SPAN&gt;-hive-bundle&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;artifactId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;1.4.0&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;create external table sinet_test&lt;BR /&gt;(&lt;BR /&gt;datatimestamp INT,&lt;BR /&gt;serverid INT&lt;BR /&gt;)&lt;BR /&gt;PARTITIONED BY (year String)&lt;BR /&gt;ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'&lt;BR /&gt;STORED AS&lt;BR /&gt;INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'&lt;BR /&gt;OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'&lt;BR /&gt;LOCATION '/tmp/sinet/writer';&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Can someone please help look into this?&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Thanks&lt;/P&gt;&lt;P class="p1"&gt;Kanwal&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 09:08:34 GMT</pubDate>
    <dc:creator>kanwal</dc:creator>
    <dc:date>2022-09-16T09:08:34Z</dc:date>
    <item>
      <title>Unable to read multiple Parquet files in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19410#M3087</link>
      <description>&lt;P&gt;We are using ParquetFileWriter to generate Parquet files and want to be able to query this in hive. So in hive we have it setup as an external table that is pointing to HDFS folder where parquet files are located. This all works great.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Next we tried to setup&amp;nbsp;a partitioned tables so we changed writer to generate multiple folders based on the partition key Year. The external table in hive has been updated to have PartitionBy clause. &amp;nbsp;We've also maually added partitions hive using alter table statement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now on querying the table we are getting&amp;nbsp;incorrect results and it appers to be that the first parquet file loaded contents are returned for all the partitions. So if we query for year 2012 we get 5 records and we will get the same for 2013 and 2014. If we re-start hive shell and query directly for 2014 we correct result only for that partition and now subsequent queries all return the same data from 2014 partition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are using CDH 4.6 with Hive 0.10&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Files weith created with following&lt;/P&gt;&lt;P&gt;&amp;lt;dependency&amp;gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;org.apache.hive&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;artifactId&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;hive-&lt;/SPAN&gt;&lt;SPAN&gt;exec&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;artifactId&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;0.10.0&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;com.twitter&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;groupId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p2"&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;artifactId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;parquet&lt;/SPAN&gt;-hive-bundle&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;&lt;SPAN&gt;artifactId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt;1.4.0&lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;version&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;&amp;lt;/&lt;/SPAN&gt;dependency&lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&lt;SPAN&gt;create external table sinet_test&lt;BR /&gt;(&lt;BR /&gt;datatimestamp INT,&lt;BR /&gt;serverid INT&lt;BR /&gt;)&lt;BR /&gt;PARTITIONED BY (year String)&lt;BR /&gt;ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'&lt;BR /&gt;STORED AS&lt;BR /&gt;INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'&lt;BR /&gt;OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'&lt;BR /&gt;LOCATION '/tmp/sinet/writer';&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Can someone please help look into this?&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Thanks&lt;/P&gt;&lt;P class="p1"&gt;Kanwal&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:08:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19410#M3087</guid>
      <dc:creator>kanwal</dc:creator>
      <dc:date>2022-09-16T09:08:34Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read multiple Parquet files in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19590#M3088</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I think there was a bug related to reading incorrect footers which was fixed in &lt;A href="https://issues.apache.org/jira/browse/HIVE-5783." target="_blank"&gt;https://issues.apache.org/jira/browse/HIVE-5783.&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Can you try upgrading to CDH5 and reproducing?</description>
      <pubDate>Wed, 01 Oct 2014 19:04:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19590#M3088</guid>
      <dc:creator>brock</dc:creator>
      <dc:date>2014-10-01T19:04:20Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read multiple Parquet files in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19870#M3089</link>
      <description>&lt;P&gt;Thanks.&amp;nbsp;The bug seems to be related to doing select * vs selecting individual columns. Now I'm able to query individual columns but then running a query that required MR job is failing with the following error&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Diagnostic Messages for this Task:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.io.IOException: java.lang.reflect.InvocationTargetException&lt;BR /&gt;at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)&lt;BR /&gt;at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)&lt;BR /&gt;at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:372)&lt;BR /&gt;at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.&amp;lt;init&amp;gt;(HadoopShimsSecure.java:319)&lt;BR /&gt;at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:433)&lt;BR /&gt;at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)&lt;BR /&gt;at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)&lt;BR /&gt;at org.apache.hadoop.mapred.Child$4.run(Child.java:268)&lt;BR /&gt;at java.security.AccessController.doPrivileged(Native&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Kanwal&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2014 19:09:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19870#M3089</guid>
      <dc:creator>kanwal</dc:creator>
      <dc:date>2014-10-08T19:09:59Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read multiple Parquet files in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19974#M3090</link>
      <description>That's a very generic error. Can you look at the task logs to see if there is a longer stack trace?</description>
      <pubDate>Thu, 09 Oct 2014 21:25:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19974#M3090</guid>
      <dc:creator>brock</dc:creator>
      <dc:date>2014-10-09T21:25:47Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read multiple Parquet files in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19976#M3091</link>
      <description>I did look into task logs and couldn't find any additional information.&lt;BR /&gt;&lt;BR /&gt;Its the same stack trace in the log as well.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Kanwal&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 09 Oct 2014 21:29:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-read-multiple-Parquet-files-in-hive/m-p/19976#M3091</guid>
      <dc:creator>kanwal</dc:creator>
      <dc:date>2014-10-09T21:29:20Z</dc:date>
    </item>
  </channel>
</rss>

