<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Error while running Insert overwrite query on hive table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-while-running-Insert-overwrite-query-on-hive-table/m-p/320536#M228172</link>
    <description>&lt;P&gt;Hello Experts,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have identified that 2 records have been duplicated in our hive tables. We have taken the backup of tables in case if we need to rollback. But now when we run insert overwrite command (e.g. insert overwrite table demo select distinct * from demo;)on smallest table with raw volume of "570 GB" but we are getting the following error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INFO : 2021-07-11 15:33:47,756 Stage-0_0: 122/122 Finished Stage-1_0: 70(+380,-64)/978&lt;BR /&gt;INFO : state = STARTED&lt;BR /&gt;INFO : state = FAILED&lt;BR /&gt;ERROR : Status: Failed&lt;BR /&gt;ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask&lt;BR /&gt;DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable&lt;BR /&gt;INFO : Completed executing command(queryId=hive_19660743242525_d9c3a756-452f-472c-a92e-2b966c37d0ce); Time taken: 4078.407 seconds&lt;BR /&gt;DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable&lt;BR /&gt;Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please find below hive server2 logs:-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2021-07-11 15:33:49,834 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: [HiveServer2-Background-Pool: Thread-29919]: Call: delete took 30ms&lt;BR /&gt;2021-07-11 15:33:49,834 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-29919]: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The default parameters of hive are as follows:-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hive.execution.engine=spark;&lt;BR /&gt;spark.executor.memory=12g;&lt;BR /&gt;spark.executor.cores=4;&lt;BR /&gt;hive.optimize.sort.dynamic.partition=true;&lt;BR /&gt;hive.exec.dynamic.partition.mode=strict;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kindly suggest how to resolve this issue. Do we need to change any of the above default parameters or some other parameters which we have missed.&lt;BR /&gt;Hope, we are running the correct query of insert overwrite to remove duplicate records.&lt;/P&gt;</description>
    <pubDate>Mon, 12 Jul 2021 14:19:52 GMT</pubDate>
    <dc:creator>HanzalaShaikh</dc:creator>
    <dc:date>2021-07-12T14:19:52Z</dc:date>
  </channel>
</rss>

