About francis1

francis1 · ‎02-02-2016

@Artem Ervits yes still having the issue. I have moved on to other things though. What is the correct response in that scenario?

francis1 · ‎01-27-2016

Using the sandbox I have saved a parquet file as a table with: df.write.format('parquet').mode('overwrite').saveAsTable(myfile) followed by: sqlContext.refreshTable(myfile) when I attempt to query the file with SparkSQL or Hive I get the error: {"message":"H170 Unable to fetch results. java.io.IOException: java.io.IOException: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/myfile/part-r-00000-5dc24bf0-23ef-4f3c-a1fc-42928761592d.gz.parquet not a SequenceFile [ERROR_STATUS]","status":500,"trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: java.io.IOException: .... This issue started after I had first replaced the parquet file underlying the original df and attempted to rebuild. When I run df.head(10) I can see the dataframe. I have attempted manually deleting the parquet and the Hive files under the warehouse, even after they are deleted when I resave the table the issue occurs. I have sqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "false") I have tried os.environ["HADOOP_USER_NAME"] = 'hdfs' I have tried unpersisting the dataframe I have tried changing the permissions with os.system('hdfs fs -chmod -R 777 hdfs://apps/hive/warehouse') I can't seem to clear out this issue. I have seen resolutions with the above but none have helped me. I can't seem to get back to being able to access the data via Hive or SparkSQL.

Online	Offline
Last Visited	‎02-03-2016 04:54 AM

Member Since	‎01-27-2016 11:04 AM
Last Visited	‎02-03-2016 04:54 AM
Posts	2
Kudos received	2

Cloudera Community

Re: HIVE / SparkSQL '.parquet not a SequenceFile '

HIVE / SparkSQL '.parquet not a SequenceFile '