Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Insert overwrite with in the same table in spark.

New Contributor

Hi All/ @Shu,

In my project duplicates are creating while saving the records in some random cases. so, we written few queries as below to remove the duplicates.

Step 1:

create view db1.temp_no_duplicates as select distinct * from db2.main_table_with_duplicates;

creating a temp table on main table and save records in the temp table by applying distinct condition on primary keys and executed this query using hive context.

Step 2:

insert overwrite table db2.main_table_with_duplicates select * from db1.temp_no_duplicates;

Overwriting the main table with records in temp table.

While we are executing this we are facing an error :

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

Is it possible to overwrite like this?

Thank You.


Master Guru

@Veera Pavan

This job will work fine in Hive but in Spark follow these steps:

  1. write the data to temporary table first.
  2. then select from temporary table
  3. insert overwrite the final table.

Check this similar thread regards to similar case.


If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

New Contributor

Why doesn't spark work like hive?
Just create a temporary directory to store the final files, and finally rename it.