Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

spark-submit and hive tables - 'Table not found'

avatar
Expert Contributor

I need to submit a Python process which uses Spark. This process needs to access Hive tables, but is unable to find them.

In order to schedule the process tu run, we use spark-submit, eg:

spark-submit pyspark_helloworld.py

This is the code (first 5 lines were added in order to run the process from outside pyspark command line, this is, via spark-submit):

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
conf1 = SparkConf().setAppName("hellospark")
sc1 = SparkContext(conf=conf1)
sqlCtx = SQLContext(sc1)

print("Hello spark")
documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents")
print "Number of documents found:"
for content in documents.collect():
    print content.cnt


And this is the output:

Hello spark 

Traceback (most recent call last):   

File "/home/bigdataquanam/pyspark_helloworld.py", line 10, in <module>
    documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents")   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__   

File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco
pyspark.sql.utils.AnalysisException: u'Table not found: ext_contents;' 

INFO SparkContext: Invoking stop() from shutdown hook

Any ideas? Thanks!

1 ACCEPTED SOLUTION

avatar
@Fernando Lopez Bello

You need to have a hivecontext to access hive tables.

from pyspark.sql import HiveContext 
sqlCtx = HiveContext(sc1)

View solution in original post

3 REPLIES 3

avatar

You should use HiveContext, see https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#hive-tables

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc1)

and then you can test access to your table.

Also try

sqlContext.sql("show databases").show()
sqlContext.sql("show tables").show()

to see what you can acccess

avatar
@Fernando Lopez Bello

You need to have a hivecontext to access hive tables.

from pyspark.sql import HiveContext 
sqlCtx = HiveContext(sc1)

avatar
Expert Contributor

Great! Thanks.