Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to query in spark

avatar
Explorer

I have a hive table created and data is in the following location

/apps/hive/warehouse/temp.db/test2/c5=56/000000_0

When I query the hive table in spark I am getting java.io.FileNotFoundException.

here is the log:

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 103 more Caused by: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1062) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1040) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1713) at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

I found the problem it is looking for part file which starts with part-* where as the data in the location is starting with 00000_0

Spark version - 1.6.2

hdp version 2.5.0.0-1245

Please advise

7 REPLIES 7

avatar
Explorer

Hi , Can you try to save sample table to Hive from Spark ? Then try to re-read table and see if you are able to read it. Regards, Fahim

avatar
Explorer

I am afraid when I can't read the data in to df from hive how can I save the data into a table

avatar
Explorer

Hi Hemanth , What I assuming is , you have already created hive table and try to read it from Spark. What I am suggesting is from Spark also you can create a hive table using Spark SQL. Try to create a small hive table using Spark and try to read also. This will prove that your Spark functionality is working correctly with Hive. And issue is with specific table you posted in comment. Regards, Fahim

avatar
Explorer

I am able to read it now, just I have repaired the table, It is working fine. Thanks

avatar
Explorer

@HEMANTH KUMAR RATAKONDA

Can you please let us know how you investigated that table is corrupted and need repair ? It will help.

Regards,

Fahim

avatar

@HEMANTH KUMAR RATAKONDA

Spark configuration was not pointing to the right hadoop Configuration directory. Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.

avatar
Explorer

I am able to read files from hdfs, but the problem was with hive table alone.