<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Cannot query parquet file generated by Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37264#M18952</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Loaded a parquet file using Spark. I can read the file contents in Spark.&lt;/P&gt;&lt;P&gt;Created an external table on parquet file using the following syntax, altered table to add the partition,&lt;/P&gt;&lt;P&gt;select * from table returns null for all rows and columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE test_browser&lt;BR /&gt;(&lt;BR /&gt;fld1 string,&lt;BR /&gt;fld2 string,&lt;BR /&gt;FileName string,&lt;BR /&gt;LoadDate string,&lt;BR /&gt;Checksum string,&lt;BR /&gt;RecordId string&lt;BR /&gt;)&lt;BR /&gt;PARTITIONED BY (fname string)&lt;BR /&gt;ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'&lt;BR /&gt;STORED AS&lt;BR /&gt;INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'&lt;BR /&gt;OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'&lt;BR /&gt;LOCATION 'hdfs://nameservice1/temp/dims/browser';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE&amp;nbsp;browser ADD PARTITION&amp;nbsp;(fname='browser.parquet')&lt;BR /&gt;LOCATION&amp;nbsp;'hdfs://nameservice1/temp/dims/browser/browser.parquet';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any pointers how to fix this, if you need additional info needed I'll add.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:03:32 GMT</pubDate>
    <dc:creator>MKSmith</dc:creator>
    <dc:date>2022-09-16T10:03:32Z</dc:date>
    <item>
      <title>Cannot query parquet file generated by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37264#M18952</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Loaded a parquet file using Spark. I can read the file contents in Spark.&lt;/P&gt;&lt;P&gt;Created an external table on parquet file using the following syntax, altered table to add the partition,&lt;/P&gt;&lt;P&gt;select * from table returns null for all rows and columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE test_browser&lt;BR /&gt;(&lt;BR /&gt;fld1 string,&lt;BR /&gt;fld2 string,&lt;BR /&gt;FileName string,&lt;BR /&gt;LoadDate string,&lt;BR /&gt;Checksum string,&lt;BR /&gt;RecordId string&lt;BR /&gt;)&lt;BR /&gt;PARTITIONED BY (fname string)&lt;BR /&gt;ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'&lt;BR /&gt;STORED AS&lt;BR /&gt;INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'&lt;BR /&gt;OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'&lt;BR /&gt;LOCATION 'hdfs://nameservice1/temp/dims/browser';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE&amp;nbsp;browser ADD PARTITION&amp;nbsp;(fname='browser.parquet')&lt;BR /&gt;LOCATION&amp;nbsp;'hdfs://nameservice1/temp/dims/browser/browser.parquet';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any pointers how to fix this, if you need additional info needed I'll add.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:03:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37264#M18952</guid>
      <dc:creator>MKSmith</dc:creator>
      <dc:date>2022-09-16T10:03:32Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot query parquet file generated by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37318#M18953</link>
      <description>&lt;P&gt;Found the problem.&amp;nbsp;T&lt;SPAN&gt;he hive schema I was&amp;nbsp;using different from parquet file content. Recreated Hive table with correct columns, fixed.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 16:54:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37318#M18953</guid>
      <dc:creator>MKSmith</dc:creator>
      <dc:date>2016-02-11T16:54:37Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot query parquet file generated by Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37321#M18954</link>
      <description>&lt;P&gt;Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Feb 2016 17:12:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-query-parquet-file-generated-by-Spark/m-p/37321#M18954</guid>
      <dc:creator>cjervis</dc:creator>
      <dc:date>2016-02-11T17:12:16Z</dc:date>
    </item>
  </channel>
</rss>

