Created 11-22-2016 05:22 PM
I need to submit a Python process which uses Spark. This process needs to access Hive tables, but is unable to find them.
In order to schedule the process tu run, we use spark-submit, eg:
spark-submit pyspark_helloworld.py
This is the code (first 5 lines were added in order to run the process from outside pyspark command line, this is, via spark-submit):
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext conf1 = SparkConf().setAppName("hellospark") sc1 = SparkContext(conf=conf1) sqlCtx = SQLContext(sc1) print("Hello spark") documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents") print "Number of documents found:" for content in documents.collect(): print content.cnt
And this is the output:
Hello spark Traceback (most recent call last): File "/home/bigdataquanam/pyspark_helloworld.py", line 10, in <module> documents = sqlCtx.sql("SELECT count(*) as cnt FROM ext_contents") File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco pyspark.sql.utils.AnalysisException: u'Table not found: ext_contents;' INFO SparkContext: Invoking stop() from shutdown hook
Any ideas? Thanks!
Created 11-22-2016 05:51 PM
You need to have a hivecontext to access hive tables.
from pyspark.sql import HiveContext sqlCtx = HiveContext(sc1)
Created 11-22-2016 05:45 PM
You should use HiveContext, see https://spark.apache.org/docs/1.6.1/sql-programming-guide.html#hive-tables
from pyspark.sql import HiveContext sqlContext = HiveContext(sc1)
and then you can test access to your table.
Also try
sqlContext.sql("show databases").show() sqlContext.sql("show tables").show()
to see what you can acccess
Created 11-22-2016 05:51 PM
You need to have a hivecontext to access hive tables.
from pyspark.sql import HiveContext sqlCtx = HiveContext(sc1)
Created 11-22-2016 05:51 PM
Great! Thanks.