Created 06-18-2019 03:02 PM
Hi All/ @Shu,
In my project duplicates are creating while saving the records in some random cases. so, we written few queries as below to remove the duplicates.
Step 1:
create view db1.temp_no_duplicates as select distinct * from db2.main_table_with_duplicates;
creating a temp table on main table and save records in the temp table by applying distinct condition on primary keys and executed this query using hive context.
Step 2:
insert overwrite table db2.main_table_with_duplicates select * from db1.temp_no_duplicates;
Overwriting the main table with records in temp table.
While we are executing this we are facing an error :
org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;
Is it possible to overwrite like this?
Thank You.
Created 06-19-2019 01:06 AM
This job will work fine in Hive but in Spark follow these steps:
Check this similar thread regards to similar case.
-
If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
Created 10-12-2020 12:22 AM
Why doesn't spark work like hive?
Just create a temporary directory to store the final files, and finally rename it.