Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pyspark.sql.utils.AnalysisException: u'Table not found:

avatar
Expert Contributor

On CDH 5.12.x running Spark version 1.6.0 I get an error while running a python script in Hue/oozie/spark under master=yarn and deploy=cluster. The same script runs successfully when running as $spark-submit from a terminal.

 

 

2017-10-29 14:41:12,416 [Thread-8] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
2017-10-29 14:41:12,417 [Thread-8] INFO org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
2017-10-29 14:41:12,515 [Thread-8] WARN org.apache.hadoop.hive.metastore.ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.12.1
2017-10-29 14:41:12,641 [Thread-8] WARN org.apache.hadoop.hive.metastore.ObjectStore - Failed to get database default, returning NoSuchObjectException
2017-10-29 14:41:12,764 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore
2017-10-29 14:41:12,766 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore
2017-10-29 14:41:12,873 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty
2017-10-29 14:41:12,964 [Thread-8] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=get_all_functions from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
2017-10-29 14:41:12,966 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_all_functions
2017-10-29 14:41:12,966 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=hive ip=unknown-ip-addr cmd=get_all_functions
2017-10-29 14:41:12,967 [Thread-8] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
2017-10-29 14:41:13,136 [Thread-8] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=get_all_functions start=1509302472964 end=1509302473136 duration=172 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=0 retryCount=0 error=false>
2017-10-29 14:41:13,819 [Thread-8] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=get_table from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
2017-10-29 14:41:13,820 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=xyzdb tbl=testdata1
2017-10-29 14:41:13,820 [Thread-8] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=hive ip=unknown-ip-addr cmd=get_table : db=xyzdb tbl=testdata1
2017-10-29 14:41:13,841 [Thread-8] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=get_table start=1509302473819 end=1509302473841 duration=22 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=0 retryCount=-1 error=true>

Traceback (most recent call last):
File "example4.py", line 11, in <module>
gctbl = hive_context.sql("SELECT * FROM xyzdb.testdata1")
File "/yarn/nm/usercache/hive/appcache/application_1509052489118_0076/container_1509052489118_0076_02_000001/pyspark.zip/pyspark/sql/context.py", line 580, in sql
File "/yarn/nm/usercache/hive/appcache/application_1509052489118_0076/container_1509052489118_0076_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/yarn/nm/usercache/hive/appcache/application_1509052489118_0076/container_1509052489118_0076_02_000001/pyspark.zip/pyspark/sql/utils.py", line 51, in deco
pyspark.sql.utils.AnalysisException: u'Table not found: `xyzdb`.`testdata1`; line 1 pos 22'
2017-10-29 14:41:13,948 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User application exited with status 1
2017-10-29 14:41:13,949 [Driver] INFO org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
2017-10-29 14:41:13,951 [main] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - Uncaught exception:
org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:88)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
2017-10-29 14:41:13,954 [Thread-4] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook

 

1 REPLY 1

avatar
Expert Contributor

Was able to resolve this by providing the hive-site.xml location as below:

 

In the Hue->Query->Scheduler->Workflow->drag the Spark action to the step below. 

Add the following parameters:

Jar/py name: example4.py

FILES: /user/someuser/example4.py

Options list: --files /etc/hive/conf.cloudera.hive/hive-site.xml

 

Clicking the gears icon and gave below properties:

Spark master: yarn

Mode: cluster

App name: MySpark

 

After that when running the spark job in Hue/oozie even though it says KILLED status if we look at the actual job url the job ran successfully and data is displayed. This seems like a bug in Hue that it cannot find the hive-site.xml and gives job status=KILLED even though job is successful.