Support Questions

francis1 · ‎01-27-2016

Using the sandbox I have saved a parquet file as a table with:

df.write.format('parquet').mode('overwrite').saveAsTable(myfile)

followed by:

sqlContext.refreshTable(myfile)

when I attempt to query the file with SparkSQL or Hive I get the error:

{"message":"H170 Unable to fetch results. java.io.IOException: java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/myfile/part-r-00000-5dc24bf0-23ef-4f3c-a1fc-42928761592d.gz.parquet not a SequenceFile [ERROR_STATUS]","status":500,"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException:

....

This issue started after I had first replaced the parquet file underlying the original df and attempted to rebuild.

When I run df.head(10) I can see the dataframe.

I have attempted manually deleting the parquet and the Hive files under the warehouse, even after they are deleted when I resave the table the issue occurs.

I have sqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "false")

I have tried os.environ["HADOOP_USER_NAME"] = 'hdfs'

I have tried unpersisting the dataframe

I have tried changing the permissions with os.system('hdfs fs -chmod -R 777 hdfs://apps/hive/warehouse')

I can't seem to clear out this issue. I have seen resolutions with the above but none have helped me. I can't seem to get back to being able to access the data via Hive or SparkSQL.

jstraub · ‎01-27-2016

This is a long shot, but I had some trouble with Parquet and Hive in the past and one change that fixed my problem was the switch to ORC. The new Spark Version does support ORC files and Hive is optimized towards ORC. Could you save your data as ORC and run your spark sql again?

df.write.format("orc")...

View solution in original post

jstraub · ‎01-27-2016

This is a long shot, but I had some trouble with Parquet and Hive in the past and one change that fixed my problem was the switch to ORC. The new Spark Version does support ORC files and Hive is optimized towards ORC. Could you save your data as ORC and run your spark sql again?

df.write.format("orc")...

aervits · ‎02-02-2016

@Francis McGregor-Macdonald are you still having issues with this? Can you accept best answer or provide your own solution?

francis1 · ‎02-02-2016

@Artem Ervits yes still having the issue. I have moved on to other things though. What is the correct response in that scenario?

aervits · ‎02-02-2016

@Francis McGregor-Macdonald correct response is to call in the big guns :). @vshukla @Ram Sriharsha

vshukla · ‎02-03-2016

Think this may be related to https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-5=az1SPxo8EaQvOs2JMh=V82z...

What is your spark-shell mode? Yarn-cluster or yarn-client?

Cloudera Community

Support Questions

HIVE / SparkSQL '.parquet not a SequenceFile '