Created 08-09-2017 11:49 AM
I have a hive table created and data is in the following location
/apps/hive/warehouse/temp.db/test2/c5=56/000000_0
When I query the hive table in spark I am getting java.io.FileNotFoundException.
here is the log:
Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 103 more Caused by: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1062) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1040) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1713) at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
I found the problem it is looking for part file which starts with part-* where as the data in the location is starting with 00000_0
Spark version - 1.6.2
hdp version 2.5.0.0-1245
Please advise
Created 08-09-2017 12:20 PM
Hi , Can you try to save sample table to Hive from Spark ? Then try to re-read table and see if you are able to read it. Regards, Fahim
Created 08-09-2017 12:25 PM
I am afraid when I can't read the data in to df from hive how can I save the data into a table
Created 08-10-2017 07:31 AM
Hi Hemanth , What I assuming is , you have already created hive table and try to read it from Spark. What I am suggesting is from Spark also you can create a hive table using Spark SQL. Try to create a small hive table using Spark and try to read also. This will prove that your Spark functionality is working correctly with Hive. And issue is with specific table you posted in comment. Regards, Fahim
Created 08-10-2017 09:25 AM
I am able to read it now, just I have repaired the table, It is working fine. Thanks
Created 08-10-2017 06:16 PM
Can you please let us know how you investigated that table is corrupted and need repair ? It will help.
Regards,
Fahim
Created 08-10-2017 09:10 AM
Spark configuration was not pointing to the right hadoop Configuration directory. Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.
Created 08-10-2017 10:06 AM
I am able to read files from hdfs, but the problem was with hive table alone.