Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive and Impala duplicating rows when INSERT OVERWRITE

Hive and Impala duplicating rows when INSERT OVERWRITE

New Contributor

Hi there!!!

I am using Hadoop on Cloudera and I was trying to do an INSERT OVERWRITE of a table to another.

The problem:

Table1 has 1561 rows

INSERT INTO TABLE Table2 SELECT * from Table1;

After this I have 1715 rows on Table2.

 

I have also tried to create table2 as:

CREATE TABLE Table2 AS SELECT * from Table1;

After this I have 1715 rows on Table 2 as well.

 

I have checked and the different values are all duplicates

select count(*)
from Table2
group by all_columns
having count(*) > 1;

154 duplicated rows.

 

Why is this happening?

I have tried these using Hive and Impala.

Can someone help me to solve this?

1 REPLY 1
Highlighted

Re: Hive and Impala duplicating rows when INSERT OVERWRITE

Master Collaborator

I can't really conceive of any explanation other than that the duplicate rows existed in the original table.

Don't have an account?
Coming from Hortonworks? Activate your account here