Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Access Hive Managed Tables with PySpark in Hue

New Contributor

Hi,

 

we are currently trying to access Hive managed tables using the PySpark shell in Hue. Our platform is CDP Private Cloud 7.1.6.

 

We are able to run PySpark jobs, which read and write Hive managed tables with following configuration:

spark-submit \
--name "example_job" \
--master yarn \
--deploy-mode cluster \
--py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.6.0-297.zip \
--jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.6.0-297.jar \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://zookeeper-host:2181/default;retries=3;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ORGANISATION \
--conf spark.datasource.hive.warehouse.load.staging.dir=/tmp \
--conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V1 \
example_job.py

 

Now we are wondering how to start a Spark context in Hue using this configuration. Can you give us a hint, how to specify Spark conf values and libraries in Hue/Livy?

 

Thanks.

0 REPLIES 0