In short: I have a working hive on hdp3, which I cannot reach from pyspark, running under yarn (on the same hdp). How do I get pyspark to find my tables?
spark.catalog.listDatabases() only show default, any query run will not show in my hive logs.
This is my code, with spark 2.3.1
from pyspark.sql import SparkSession from pyspark.conf import SparkConf settings =  conf = SparkConf().setAppName("Guillaume is here").setAll(settings) spark = ( SparkSession .builder .master('yarn') .config(conf=conf) .enableHiveSupport() .getOrCreate() ) print(spark.catalog.listDatabases())
Note that `settings` is empty. I though it would be sufficient, because in the logs I see
loading hive config file: file:/etc/spark2/22.214.171.124-187/0/hive-site.xml
and more interestingly
Registering function intersectgroups io.x.x.IntersectGroups
This is a UDF I wrote and added to hive manually. This means that there is some sort of connection done. The only output I get (except logs) is:
[ Database(name=u'default', description=u'default database', locationUri=u'hdfs://HdfsNameService/apps/spark/warehouse')]
I understand that I should set `spark.sql.warehouse.dir` in settings. No matter if I set it to the value I find in hive-site, the path to the database I am interested in (it's not in the default location), its parent, nothing changes.
I put many other config options in settings (including thrift uris), no changes.
I have seen as well that I should copy hive-site.xml into the spark2 conf dir. I did it on all nodes of my cluster, no changes.
My command to run is:
HDP_VERSION=126.96.36.199-187 PYTHONPATH=.:/usr/hdp/current/spark2-client/python/:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip SPARK_HOME=/usr/hdp/current/spark2-client HADOOP_USER_NAME=hive spark-submit --master yarn --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-188.8.131.52.0.1.0-187.jar --py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-184.108.40.206.0.1.0-187.zip --files /etc/hive/conf/hive-site.xml ./subjanal/anal.py