Support Questions
Find answers, ask questions, and share your expertise

Access Hive Managed Tables with PySpark in Hue

New Contributor

Hi,

 

we are currently trying to access Hive managed tables using the PySpark shell in Hue. Our platform is CDP Private Cloud 7.1.6.

 

We are able to run PySpark jobs, which read and write Hive managed tables with following configuration:

spark-submit \
--name "example_job" \
--master yarn \
--deploy-mode cluster \
--py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.6.0-297.zip \
--jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.6.0-297.jar \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://zookeeper-host:2181/default;retries=3;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ORGANISATION \
--conf spark.datasource.hive.warehouse.load.staging.dir=/tmp \
--conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V1 \
example_job.py

 

Now we are wondering how to start a Spark context in Hue using this configuration. Can you give us a hint, how to specify Spark conf values and libraries in Hue/Livy?

 

Thanks.

0 REPLIES 0