Support Questions

pavan_veera9 · ‎06-18-2019

Hi All/ @Shu,

In my project duplicates are creating while saving the records in some random cases. so, we written few queries as below to remove the duplicates.

Step 1:

create view db1.temp_no_duplicates as select distinct * from db2.main_table_with_duplicates;

creating a temp table on main table and save records in the temp table by applying distinct condition on primary keys and executed this query using hive context.

Step 2:

insert overwrite table db2.main_table_with_duplicates select * from db1.temp_no_duplicates;

Overwriting the main table with records in temp table.

While we are executing this we are facing an error :

org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

Is it possible to overwrite like this?

Thank You.

Shu_ashu · ‎06-19-2019

@Veera Pavan

This job will work fine in Hive but in Spark follow these steps:

write the data to temporary table first.
then select from temporary table
insert overwrite the final table.

Check this similar thread regards to similar case.

-

If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

Fly_boy · ‎10-12-2020

Why doesn't spark work like hive?
Just create a temporary directory to store the final files, and finally rename it.

Cloudera Community

Support Questions

Insert overwrite with in the same table in spark.