Member since
06-01-2022
3
Posts
0
Kudos Received
0
Solutions
10-10-2023
10:29 AM
1 Kudo
Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode pyspark --master yarn --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER' --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'. Example Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
>>> from pyspark_llap import HiveWarehouseSession
>>> hive = HiveWarehouseSession.session(spark).build()
>>> df=hive.sql("select * from spark_hwc.employee")
23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners
>>> df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
>>>
>>>
>>> hive.sql("select count(*) from spark_hwc.employee_new").show()
23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
+---+
|_c0|
+---+
| 5|
+---+
>>> To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'. Example >>> df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
>>> To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html
... View more