Support Questions
Find answers, ask questions, and share your expertise

Unable to query in spark

New Contributor

I have a hive table created and data is in the following location

/apps/hive/warehouse/temp.db/test2/c5=56/000000_0

When I query the hive table in spark I am getting java.io.FileNotFoundException.

here is the log:

Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 103 more Caused by: java.io.FileNotFoundException: File hdfs://si1/apps/hive/warehouse/temp.db/test1/c5=56/part-00000 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1062) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1040) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:981) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1713) at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:667) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:620) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

I found the problem it is looking for part file which starts with part-* where as the data in the location is starting with 00000_0

Spark version - 1.6.2

hdp version 2.5.0.0-1245

Please advise

7 REPLIES 7

Re: Unable to query in spark

Explorer

Hi , Can you try to save sample table to Hive from Spark ? Then try to re-read table and see if you are able to read it. Regards, Fahim

Re: Unable to query in spark

New Contributor

I am afraid when I can't read the data in to df from hive how can I save the data into a table

Re: Unable to query in spark

Explorer

Hi Hemanth , What I assuming is , you have already created hive table and try to read it from Spark. What I am suggesting is from Spark also you can create a hive table using Spark SQL. Try to create a small hive table using Spark and try to read also. This will prove that your Spark functionality is working correctly with Hive. And issue is with specific table you posted in comment. Regards, Fahim

Re: Unable to query in spark

New Contributor

I am able to read it now, just I have repaired the table, It is working fine. Thanks

Re: Unable to query in spark

Explorer

@HEMANTH KUMAR RATAKONDA

Can you please let us know how you investigated that table is corrupted and need repair ? It will help.

Regards,

Fahim

Re: Unable to query in spark

@HEMANTH KUMAR RATAKONDA

Spark configuration was not pointing to the right hadoop Configuration directory. Point the value of HADOOP_CONF_DIR under spark-env.sh in spark. If spark does not points to proper hadoop configuration directory it might results in similar error.

Re: Unable to query in spark

New Contributor

I am able to read files from hdfs, but the problem was with hive table alone.