We have identified that 2 records have been duplicated in our hive tables. We have taken the backup of tables in case if we need to rollback. But now when we run insert overwrite command (e.g. insert overwrite table demo select distinct * from demo;)on smallest table with raw volume of "570 GB" but we are getting the following error.
INFO : 2021-07-11 15:33:47,756 Stage-0_0: 122/122 Finished Stage-1_0: 70(+380,-64)/978 INFO : state = STARTED INFO : state = FAILED ERROR : Status: Failed ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable INFO : Completed executing command(queryId=hive_19660743242525_d9c3a756-452f-472c-a92e-2b966c37d0ce); Time taken: 4078.407 seconds DEBUG : Shutting down query insert overwrite table raw_switch.partitiontable select distinct * from raw_switch.partitiontable Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3)
Kindly suggest how to resolve this issue. Do we need to change any of the above default parameters or some other parameters which we have missed. Hope, we are running the correct query of insert overwrite to remove duplicate records.