Support Questions

Find answers, ask questions, and share your expertise

Insert overwrite with in the same table in spark.

avatar
New Contributor

Hi All/ @Shu,


In my project duplicates are creating while saving the records in some random cases. so, we written few queries as below to remove the duplicates.

Step 1:

create view db1.temp_no_duplicates as select distinct * from db2.main_table_with_duplicates;

creating a temp table on main table and save records in the temp table by applying distinct condition on primary keys and executed this query using hive context.

Step 2:

insert overwrite table db2.main_table_with_duplicates select * from db1.temp_no_duplicates;

Overwriting the main table with records in temp table.


While we are executing this we are facing an error :

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;


Is it possible to overwrite like this?


Thank You.


2 REPLIES 2

avatar
Master Guru

@Veera Pavan

This job will work fine in Hive but in Spark follow these steps:

  1. write the data to temporary table first.
  2. then select from temporary table
  3. insert overwrite the final table.

Check this similar thread regards to similar case.

-

If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

avatar
New Contributor

Why doesn't spark work like hive?
Just create a temporary directory to store the final files, and finally rename it.