Support Questions

thunderhemu · ‎08-09-2017

I have a hive table created and data is in the following location

/apps/hive/warehouse/temp.db/test2/c5=56/000000_0

When I query the hive table in spark I am getting java.io.FileNotFoundException.

here is the log:

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 103 more Caused by: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1062) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1040) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1713) at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

I found the problem it is looking for part file which starts with part-* where as the data in the location is starting with 00000_0

Spark version - 1.6.2

hdp version 2.5.0.0-1245

Please advise

MFP · ‎08-09-2017

Hi , Can you try to save sample table to Hive from Spark ? Then try to re-read table and see if you are able to read it. Regards, Fahim

thunderhemu · ‎08-09-2017

I am afraid when I can't read the data in to df from hive how can I save the data into a table

MFP · ‎08-10-2017

Hi Hemanth , What I assuming is , you have already created hive table and try to read it from Spark. What I am suggesting is from Spark also you can create a hive table using Spark SQL. Try to create a small hive table using Spark and try to read also. This will prove that your Spark functionality is working correctly with Hive. And issue is with specific table you posted in comment. Regards, Fahim

thunderhemu · ‎08-10-2017

I am able to read it now, just I have repaired the table, It is working fine. Thanks

MFP · ‎08-10-2017

@HEMANTH KUMAR RATAKONDA

Can you please let us know how you investigated that table is corrupted and need repair ? It will help.

Regards,

Fahim

balavignesh_nag · ‎08-10-2017

@HEMANTH KUMAR RATAKONDA

Spark configuration was not pointing to the right hadoop Configuration directory. Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.

thunderhemu · ‎08-10-2017

I am able to read files from hdfs, but the problem was with hive table alone.

Cloudera Community

Support Questions

Unable to query in spark

Spark 3 legacy configurations list ( Spark 2 behav...

Spark Python Supportability Matrix

Spark Scala Version Compatibility Matrix

How to display query metrics of Analyzer/Optimizer...

Spark Memory Management

Spark Python Integration Test Result Exceptions

Dynamic Allocation in Apache Spark

Spark to support REGEX column specification for Hi...

Apache Spark and Iceberg Supportability Matrix

Spark Streaming Graceful Shutdown - Part1