question Re: HIVE / SparkSQL '.parquet not a SequenceFile ' in Archives of Support Questions (Read Only)

HIVE / SparkSQL '.parquet not a SequenceFile '

francis1 — Wed, 27 Jan 2016 19:30:31 GMT

Using the sandbox I have saved a parquet file as a table with:

df.write.format('parquet').mode('overwrite').saveAsTable(myfile)

followed by:

sqlContext.refreshTable(myfile)

when I attempt to query the file with SparkSQL or Hive I get the error:

{"message":"H170 Unable to fetch results. java.io.IOException: java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/myfile/part-r-00000-5dc24bf0-23ef-4f3c-a1fc-42928761592d.gz.parquet not a SequenceFile [ERROR_STATUS]","status":500,"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException:

....

This issue started after I had first replaced the parquet file underlying the original df and attempted to rebuild.

When I run df.head(10) I can see the dataframe.

I have attempted manually deleting the parquet and the Hive files under the warehouse, even after they are deleted when I resave the table the issue occurs.

I have sqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "false")

I have tried os.environ["HADOOP_USER_NAME"] = 'hdfs'

I have tried unpersisting the dataframe

I have tried changing the permissions with os.system('hdfs fs -chmod -R 777 hdfs://apps/hive/warehouse')

I can't seem to clear out this issue. I have seen resolutions with the above but none have helped me. I can't seem to get back to being able to access the data via Hive or SparkSQL.

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

jstraub — Wed, 27 Jan 2016 19:36:46 GMT

This is a long shot, but I had some trouble with Parquet and Hive in the past and one change that fixed my problem was the switch to ORC. The new Spark Version does support ORC files and Hive is optimized towards ORC. Could you save your data as ORC and run your spark sql again?

df.write.format("orc")...

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

aervits — Wed, 03 Feb 2016 03:36:35 GMT

@Francis McGregor-Macdonald are you still having issues with this? Can you accept best answer or provide your own solution?

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

francis1 — Wed, 03 Feb 2016 04:54:40 GMT

@Artem Ervits yes still having the issue. I have moved on to other things though. What is the correct response in that scenario?

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

aervits — Wed, 03 Feb 2016 04:54:54 GMT

@Francis McGregor-Macdonald correct response is to call in the big guns :). @vshukla @Ram Sriharsha

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

vshukla — Wed, 03 Feb 2016 09:47:34 GMT

Think this may be related to https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-5=az1SPxo8EaQvOs2JMh=V82zMfAz67PqGy+CQqrrc=Q@mail.gmail.com%3E

What is your spark-shell mode? Yarn-cluster or yarn-client?