<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Warehouse Connector concurrent write to Hive table issue using spark in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391630#M247701</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;, thanks for your replay&lt;/P&gt;&lt;P&gt;so basically my multiple spark jobs writes dataframes to same hive table via HWC. each spark job is different set of ingestion/transformations applied and am trying to&amp;nbsp; write to hive table - kind of audit / logs table for each spark job ingestion status , time etc.. these two tables are common across all spark jobs. I am executing the spark job via oozie workflow. Following stack trace i could see :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Error creating/checking hive table An error occurred while calling o117.save. : org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:142) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:170) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:167) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:142) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:91) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:232) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76) ... 26 more Caused by: java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:411) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:276) at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:101) at com.hortonworks.spark.sql.hive.llap.wrapper.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:48) at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.executeUpdate(HS2JDBCWrapper.scala:396) at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.executeUpdate(HS2JDBCWrapper.scala) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.handleWriteWithSaveMode(HiveWarehouseDataSourceWriter.java:345) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:230&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;HIVE&amp;nbsp;version -&amp;nbsp;&amp;nbsp;Hive 3.1.3000.7.1.8.55-1&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;This is the way I am trying to ingest using spark.&lt;BR /&gt;&lt;/STRONG&gt;df.write.mode("append").format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)\&lt;BR /&gt;.option("table", table_name).option("database", atabase_name).save()&lt;/P&gt;</description>
    <pubDate>Mon, 12 Aug 2024 08:42:43 GMT</pubDate>
    <dc:creator>bigdatacm</dc:creator>
    <dc:date>2024-08-12T08:42:43Z</dc:date>
    <item>
      <title>Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391606#M247693</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I am trying to write my spark dataframe to hive table using HWC - Hive Warehouse connector. My spark application is in pySpark and i have 5 concurrent spark application running at same time and trying to write to same hive table probably at same time. I am getting following issue&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;EM&gt;Error- caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://tablespace/managed/hive/my_db_name.db/hive_table_name/.hive-staging_hive_2024-08-05_15-55-26_488_5678420092852048777-45678&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Does HWC won't allow concurrent write to same hive table? or its limitation with hive table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Aug 2024 23:53:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391606#M247693</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-10T23:53:03Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391626#M247698</link>
      <description>&lt;P&gt;Could you please share the cluster version and spark-submit command .&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;What's the HWC execution mode?&lt;BR /&gt;Can you please share the complete StackTrace?&lt;BR /&gt;&lt;BR /&gt;As the issue is with respect to&amp;nbsp;&lt;SPAN&gt;&lt;EM&gt;MoveTask ,&amp;nbsp;HIVE-24163&amp;nbsp; &lt;/EM&gt;&lt;/SPAN&gt;can be a problem.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 06:37:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391626#M247698</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-08-12T06:37:41Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391630#M247701</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp;, thanks for your replay&lt;/P&gt;&lt;P&gt;so basically my multiple spark jobs writes dataframes to same hive table via HWC. each spark job is different set of ingestion/transformations applied and am trying to&amp;nbsp; write to hive table - kind of audit / logs table for each spark job ingestion status , time etc.. these two tables are common across all spark jobs. I am executing the spark job via oozie workflow. Following stack trace i could see :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Error creating/checking hive table An error occurred while calling o117.save. : org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:142) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:170) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:167) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:142) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:91) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:232) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76) ... 26 more Caused by: java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:411) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:276) at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:101) at com.hortonworks.spark.sql.hive.llap.wrapper.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:48) at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.executeUpdate(HS2JDBCWrapper.scala:396) at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.executeUpdate(HS2JDBCWrapper.scala) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.handleWriteWithSaveMode(HiveWarehouseDataSourceWriter.java:345) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:230&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;HIVE&amp;nbsp;version -&amp;nbsp;&amp;nbsp;Hive 3.1.3000.7.1.8.55-1&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;This is the way I am trying to ingest using spark.&lt;BR /&gt;&lt;/STRONG&gt;df.write.mode("append").format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)\&lt;BR /&gt;.option("table", table_name).option("database", atabase_name).save()&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 08:42:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391630#M247701</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-12T08:42:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391632#M247703</link>
      <description>&lt;P&gt;Can you please share the stacktrace from HiveServer2 logs&amp;nbsp; and spark-submit command used&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 08:51:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391632#M247703</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-08-12T08:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391633#M247704</link>
      <description>&lt;P&gt;sure. I will share. It seems this hive staging directory is being created at table level ? I have tried to add partitions and ingest to these partitions . my partitions was (batch name, date). so each spark job write to their respective batch name, However that also failed since hive staging temp dir. is been created at table level not in partition level&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 09:02:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/391633#M247704</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-12T09:02:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392360#M248097</link>
      <description>&lt;P&gt;make sure below is enabled at cluster level.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;STRONG&gt;hive.acid.direct.insert.enabled&lt;/STRONG&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Also use below format to insert into partitioned tables.&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Static partition&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1='val1',c2='val2'").option("table", "t1").save();&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Dynamic partition&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1,c2").option("table", "t1").save(); &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2024 12:25:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392360#M248097</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-08-22T12:25:46Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392420#M248123</link>
      <description>&lt;P&gt;Thank you. it seems hive warehouse connector creating these tmp/staging directories in table level rather than partition level.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Aug 2024 08:54:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392420#M248123</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-23T08:54:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392442#M248127</link>
      <description>&lt;P&gt;Since the partition related information not mentioned in the write statement staging directory created in the table directory instead of partition directory.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Aug 2024 13:10:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392442#M248127</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-08-23T13:10:25Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392559#M248169</link>
      <description>&lt;P&gt;I have tried with partition statement (dynamic) and still staging directory has been creatde at table level. it seems this works for only static partition&lt;/P&gt;</description>
      <pubDate>Mon, 26 Aug 2024 08:09:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392559#M248169</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-26T08:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Warehouse Connector concurrent write to Hive table issue using spark</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392680#M248206</link>
      <description>&lt;P&gt;&lt;SPAN&gt;When writing to a statically partitioned table using HWC, the following query is internally fired to Hive through JDBC after writing data to a temporary location:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Spark write statement:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1='val1',c2='val2'").option("table", "t1").save();&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;STRONG&gt;HWC internal query:&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;LOAD DATA INPATH '&amp;lt;spark.datasource.hive.warehouse.load.staging.dir&amp;gt;' [OVERWRITE] INTO TABLE db.t1 PARTITION (c1='val1',c2='val2');&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;During static partitioning, the partition information is known during compile time, resulting in the creation of a staging directory in the partition directory.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;On the other hand, when writing to a dynamically partitioned table using HWC, the following query is internally fired to Hive through JDBC after writing data to a temporary location:&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Spark write statement:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("partition", "c1='val1',c2").option("table", "t1").save();&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;HWC internal query:&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;CREATE TEMPORARY EXTERNAL TABLE db.job_id_table(cols....) STORED AS ORC LOCATION '&amp;lt;spark.datasource.hive.warehouse.load.staging.dir&amp;gt;'; 
INSERT INTO TABLE t1 PARTITION (c1='val1',c2) SELECT &amp;lt;cols&amp;gt; FROM db.job_id_table;&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;During dynamic partitioning, the partition information is known during runtime, hence the staging directory is created at the table level. Once the DAG&amp;nbsp; is completed, the MOVE TASK will move the files to the respective partitions.&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2024 07:40:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-Warehouse-Connector-concurrent-write-to-Hive-table/m-p/392680#M248206</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-08-28T07:40:01Z</dc:date>
    </item>
  </channel>
</rss>

