<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unable to append overwrite Hive ACID table using spark and HWC in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-append-overwrite-Hive-ACID-table-using-spark-and/m-p/377446#M243265</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; pyspark --master yarn   --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar   --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip   --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'   --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER'   --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp'   --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions   --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Example&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
&amp;gt;&amp;gt;&amp;gt; from pyspark_llap import HiveWarehouseSession
&amp;gt;&amp;gt;&amp;gt; hive = HiveWarehouseSession.session(spark).build()
&amp;gt;&amp;gt;&amp;gt; df=hive.sql("select * from spark_hwc.employee")
23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners
&amp;gt;&amp;gt;&amp;gt; df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; hive.sql("select count(*) from spark_hwc.employee_new").show()
23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
+---+
|_c0|
+---+
|  5|
+---+

&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;Example&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC&amp;nbsp; will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Ref -&amp;nbsp;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html" target="_blank"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 10 Oct 2023 17:29:43 GMT</pubDate>
    <dc:creator>ggangadharan</dc:creator>
    <dc:date>2023-10-10T17:29:43Z</dc:date>
    <item>
      <title>Unable to append overwrite Hive ACID table using spark and HWC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-append-overwrite-Hive-ACID-table-using-spark-and/m-p/357606#M237635</link>
      <description>&lt;P&gt;I m trying to use HWC (HiveWarehouseConnector) with Spark&lt;BR /&gt;- to append to an existing Hive ACID table&lt;BR /&gt;- to overwrite an existing Hive ACID table&lt;BR /&gt;- to append to a new Hive ACID table that does not exist before&lt;BR /&gt;- to overwrite to a new Hive ACID table that does not exist before&lt;/P&gt;&lt;P&gt;Writing to a new Hive ACID table works but writing to an existing table does not work using append/overwrite.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know how to use HWC to append/ overwrite existing ACID tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;List of steps that I followed in CDP 7.1.7 SP1 single node cluster is given below:&lt;/P&gt;&lt;P&gt;1) Ran kinit username&lt;BR /&gt;2) Ran these below commands in beeline hive client command line.&lt;BR /&gt;drop database testDatabase cascade;&lt;BR /&gt;drop database tpcds_bin_partitioned_orc_1000 cascade;&lt;BR /&gt;create database tpcds_bin_partitioned_orc_1000;&lt;BR /&gt;use tpcds_bin_partitioned_orc_1000;&lt;BR /&gt;create managed table web_sales(ws_sold_time_sk bigint, ws_ship_date_sk bigint) stored as orc;&lt;BR /&gt;insert into web_sales values(80000,1), (80001,2), (80002,3);&lt;BR /&gt;create database testDatabase;&lt;/P&gt;&lt;P&gt;3)I prepared a file called hwc_example2.scala containing the same example as in&lt;BR /&gt;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive-hwc-configure-writes.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/integrating-hive-and-bi/topics/hive-hwc-configure-writes.html&lt;/A&gt;&lt;BR /&gt;import com.hortonworks.hwc.HiveWarehouseSession&lt;BR /&gt;import com.hortonworks.hwc.HiveWarehouseSession._&lt;BR /&gt;val hive = HiveWarehouseSession.session(spark).build();&lt;BR /&gt;hive.setDatabase("tpcds_bin_partitioned_orc_1000");&lt;BR /&gt;val df = hive.sql("select * from web_sales");&lt;BR /&gt;df.createOrReplaceTempView("web_sales");&lt;BR /&gt;hive.setDatabase("testDatabase");&lt;BR /&gt;hive.createTable("newTable").ifNotExists().column("ws_sold_time_sk", "bigint").column("ws_ship_date_sk", "bigint").create();&lt;BR /&gt;sql("SELECT ws_sold_time_sk, ws_ship_date_sk FROM web_sales WHERE ws_sold_time_sk &amp;gt; 80000").write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("table", "newTable").save();&lt;/P&gt;&lt;P&gt;4)I stared the spark-shell using the below command:&lt;BR /&gt;spark-shell --master yarn --deploy-mode client --jars&lt;BR /&gt;/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar&lt;BR /&gt;--conf spark.datasource.hive.warehouse.read.mode=secure_access&lt;BR /&gt;--conf spark.datasource.hive.warehouse.load.staging.dir=hdfs://ip-nn-nn-nn-nn.ec2.internal:8020/tmp/staging/hwc&lt;BR /&gt;--conf spark.yarn.dist.files=/etc/hive/conf.cloudera.hive/hive-site.xml,/etc/hive/conf/hive-env.sh&lt;BR /&gt;--conf spark.driver.extraJavaOptions=-Djavax.security.auth.useSubjectCredsOnly=false&lt;BR /&gt;--conf spark.sql.crossJoin.enabled=true --conf spark.hadoop.hive.enforce.bucketing=false&lt;BR /&gt;--conf spark.hadoop.hive.enforce.sorting=false&lt;BR /&gt;--conf spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://ip-10-203-4-139.ec2.internal:10000/&lt;BR /&gt;--conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/ip-nn-nn-nn-nn.ec2.internal@CLOUDERALABS.COM&lt;/P&gt;&lt;P&gt;5) In the spark shell prompt ran the command ":load hwc_example2.scala" and got exception table name already exists. Exception trace is given below.&lt;BR /&gt;scala&amp;gt; :load hwc_example2.scala&lt;BR /&gt;Loading hwc_example2.scala...&lt;BR /&gt;import com.hortonworks.hwc.HiveWarehouseSession&lt;BR /&gt;import com.hortonworks.hwc.HiveWarehouseSession._&lt;BR /&gt;hive: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@a15e4fd&lt;BR /&gt;22/11/15 08:50:44 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist&lt;BR /&gt;df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [ws_sold_time_sk: bigint, ws_ship_date_sk: bigint]&lt;BR /&gt;22/11/15 08:50:54 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist&lt;BR /&gt;Hive Session ID = 90af6bc3-1efa-4ab4-982c-61d2deb46ece&lt;BR /&gt;22/11/15 08:51:04 ERROR v2.WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter@517c2be7 is aborting.&lt;BR /&gt;22/11/15 08:51:04 ERROR writers.HiveWarehouseDataSourceWriter: Aborted DataWriter job 69330822-9d06-48ac-baac-8c7eacfffebc&lt;BR /&gt;22/11/15 08:51:04 ERROR v2.WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter@517c2be7 aborted.&lt;BR /&gt;org.apache.spark.SparkException: Writing job aborted.&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:141)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:137)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:165)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:162)&lt;BR /&gt;at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:137)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93)&lt;BR /&gt;at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:91)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704)&lt;BR /&gt;at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)&lt;BR /&gt;at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)&lt;BR /&gt;at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704)&lt;BR /&gt;at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280)&lt;BR /&gt;... 78 elided&lt;BR /&gt;Caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. AlreadyExistsException(message:Table hive.testDatabase.newTable already exists)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:225)&lt;BR /&gt;at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76)&lt;BR /&gt;... 93 more&lt;BR /&gt;Caused by: java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. AlreadyExistsException(message:Table hive.testDatabase.newTable already exists)&lt;BR /&gt;at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:411)&lt;BR /&gt;at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:276)&lt;BR /&gt;at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:101)&lt;BR /&gt;at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)&lt;BR /&gt;at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.wrapper.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:37)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.executeUpdate(HS2JDBCWrapper.scala:370)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.executeUpdate(HS2JDBCWrapper.scala)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.handleWriteWithSaveMode(HiveWarehouseDataSourceWriter.java:330)&lt;BR /&gt;at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:223)&lt;BR /&gt;... 94 more&lt;/P&gt;&lt;P&gt;scala&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;Radhakrishnan&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 07:47:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-append-overwrite-Hive-ACID-table-using-spark-and/m-p/357606#M237635</guid>
      <dc:creator>trkrishnan</dc:creator>
      <dc:date>2026-04-21T07:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to append overwrite Hive ACID table using spark and HWC</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-append-overwrite-Hive-ACID-table-using-spark-and/m-p/377446#M243265</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; pyspark --master yarn   --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar   --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip   --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'   --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER'   --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp'   --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions   --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Example&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
&amp;gt;&amp;gt;&amp;gt; from pyspark_llap import HiveWarehouseSession
&amp;gt;&amp;gt;&amp;gt; hive = HiveWarehouseSession.session(spark).build()
&amp;gt;&amp;gt;&amp;gt; df=hive.sql("select * from spark_hwc.employee")
23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners
&amp;gt;&amp;gt;&amp;gt; df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; hive.sql("select count(*) from spark_hwc.employee_new").show()
23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
+---+
|_c0|
+---+
|  5|
+---+

&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;Example&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC&amp;nbsp; will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Ref -&amp;nbsp;&lt;A href="https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html" target="_blank"&gt;https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2023 17:29:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-append-overwrite-Hive-ACID-table-using-spark-and/m-p/377446#M243265</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2023-10-10T17:29:43Z</dc:date>
    </item>
  </channel>
</rss>

