Hi All/ @Shu,
In my project duplicates are creating while saving the records in some random cases. so, we written few queries as below to remove the duplicates.
create view db1.temp_no_duplicates as select distinct * from db2.main_table_with_duplicates;
creating a temp table on main table and save records in the temp table by applying distinct condition on primary keys and executed this query using hive context.
insert overwrite table db2.main_table_with_duplicates select * from db1.temp_no_duplicates;
Overwriting the main table with records in temp table.
While we are executing this we are facing an error :
org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;
Is it possible to overwrite like this?
This job will work fine in Hive but in Spark follow these steps:
Check this similar thread regards to similar case.
If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
Why doesn't spark work like hive?Just create a temporary directory to store the final files, and finally rename it.