<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HIVE / SparkSQL '.parquet not a SequenceFile ' in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113615#M16554</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2399/francis.html" nodeid="2399"&gt;@Francis McGregor-Macdonald&lt;/A&gt; correct response is to call in the big guns :). &lt;A rel="user" href="https://community.cloudera.com/users/332/vshukla.html" nodeid="332"&gt;@vshukla&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/528/rsriharsha.html" nodeid="528"&gt;@Ram Sriharsha&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Feb 2016 04:54:54 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-02-03T04:54:54Z</dc:date>
    <item>
      <title>HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113611#M16550</link>
      <description>&lt;P&gt;Using the sandbox I have saved a parquet file as a table with:&lt;/P&gt;&lt;P&gt;df.write.format('parquet').mode('overwrite').saveAsTable(myfile)&lt;/P&gt;&lt;P&gt;followed by:&lt;/P&gt;&lt;P&gt;sqlContext.refreshTable(myfile)&lt;/P&gt;&lt;P&gt;when I attempt to query the file with SparkSQL or Hive I get the error:&lt;/P&gt;&lt;P&gt;{"message":"H170 Unable to fetch results. java.io.IOException: java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/myfile/part-r-00000-5dc24bf0-23ef-4f3c-a1fc-42928761592d.gz.parquet not a SequenceFile [ERROR_STATUS]","status":500,"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException: &lt;/P&gt;&lt;P&gt;....&lt;/P&gt;&lt;P&gt;This issue started after I had first replaced the parquet file underlying the original df and attempted to rebuild.&lt;/P&gt;&lt;P&gt;When I run df.head(10) I can see the dataframe.&lt;/P&gt;&lt;P&gt;I have attempted manually deleting the parquet and the Hive files under the warehouse, even after they are deleted when I resave the table the issue occurs.&lt;/P&gt;&lt;P&gt;I have sqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "false")&lt;/P&gt;&lt;P&gt;I have tried os.environ["HADOOP_USER_NAME"] = 'hdfs'&lt;/P&gt;&lt;P&gt;I have tried unpersisting the dataframe&lt;/P&gt;&lt;P&gt;I have tried changing the permissions with os.system('hdfs fs -chmod -R 777 hdfs://apps/hive/warehouse')&lt;/P&gt;&lt;P&gt;I can't seem to clear out this issue. I have seen resolutions with the above but none have helped me. I can't seem to get back to being able to access the data via Hive or SparkSQL.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 19:30:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113611#M16550</guid>
      <dc:creator>francis1</dc:creator>
      <dc:date>2016-01-27T19:30:31Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113612#M16551</link>
      <description>&lt;P&gt;This is a long shot, but  I had some trouble with Parquet and Hive in the past and one change that fixed my problem was the switch to ORC. The new Spark Version does support ORC files and Hive is optimized towards ORC. Could you save your data as ORC and run your spark sql again?&lt;/P&gt;&lt;PRE&gt;df.write.format("orc")...&lt;/PRE&gt;</description>
      <pubDate>Wed, 27 Jan 2016 19:36:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113612#M16551</guid>
      <dc:creator>jstraub</dc:creator>
      <dc:date>2016-01-27T19:36:46Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113613#M16552</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2399/francis.html" nodeid="2399"&gt;@Francis McGregor-Macdonald&lt;/A&gt; are you still having issues with this? Can you accept best answer or provide your own solution?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 03:36:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113613#M16552</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-03T03:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113614#M16553</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt; yes still having the issue. I have moved on to other things though. What is the correct response in that scenario?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 04:54:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113614#M16553</guid>
      <dc:creator>francis1</dc:creator>
      <dc:date>2016-02-03T04:54:40Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113615#M16554</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2399/francis.html" nodeid="2399"&gt;@Francis McGregor-Macdonald&lt;/A&gt; correct response is to call in the big guns :). &lt;A rel="user" href="https://community.cloudera.com/users/332/vshukla.html" nodeid="332"&gt;@vshukla&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/528/rsriharsha.html" nodeid="528"&gt;@Ram Sriharsha&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 04:54:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113615#M16554</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-03T04:54:54Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE / SparkSQL '.parquet not a SequenceFile '</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113616#M16555</link>
      <description>&lt;P&gt;Think this may be related to &lt;A href="https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-5=az1SPxo8EaQvOs2JMh=V82zMfAz67PqGy+CQqrrc=Q@mail.gmail.com%3E" target="_blank"&gt;https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-5=az1SPxo8EaQvOs2JMh=V82zMfAz67PqGy+CQqrrc=Q@mail.gmail.com%3E&lt;/A&gt;&lt;/P&gt;&lt;P&gt;What is your spark-shell mode? Yarn-cluster or yarn-client?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 09:47:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HIVE-SparkSQL-parquet-not-a-SequenceFile/m-p/113616#M16555</guid>
      <dc:creator>vshukla</dc:creator>
      <dc:date>2016-02-03T09:47:34Z</dc:date>
    </item>
  </channel>
</rss>

