Support Questions

Find answers, ask questions, and share your expertise

spark-submit and hive tables - 'Table not found'

avatar
Expert Contributor

I need to submit a Python process which uses Spark. This process needs to access Hive tables, but is unable to find them.

In order to schedule the process tu run, we use spark-submit, eg:

spark-submit pyspark_helloworld.py

This is the code (first 5 lines were added in order to run the process from outside pyspark command line, this is, via spark-submit):

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
conf1 = SparkConf().setAppName("hellospark")
sc1 = SparkContext(conf=conf1)
sqlCtx = SQLContext(sc1)

print("Hello spark")
documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents")
print "Number of documents found:"
for content in documents.collect():
    print content.cnt


And this is the output:

Hello spark 

Traceback (most recent call last):   

File "/home/bigdataquanam/pyspark_helloworld.py", line 10, in <module>
    documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents")   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco
pyspark.sql.utils.AnalysisException: u'Table not found: ext_contents;' 

INFO SparkContext: Invoking stop() from shutdown hook

Any ideas? Thanks!

1 ACCEPTED SOLUTION

avatar
@Fernando Lopez Bello

You need to have a hivecontext to access hive tables.

from pyspark.sql import HiveContext 
sqlCtx = HiveContext(sc1)

View solution in original post

3 REPLIES 3

avatar

You should use HiveContext, see https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#hive-tables

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc1)

and then you can test access to your table.

Also try

sqlContext.sql("show databases").show()
sqlContext.sql("show tables").show()

to see what you can acccess

avatar
@Fernando Lopez Bello

You need to have a hivecontext to access hive tables.

from pyspark.sql import HiveContext 
sqlCtx = HiveContext(sc1)

avatar
Expert Contributor

Great! Thanks.