Created on
09-05-2019
09:39 AM
- last edited on
09-05-2019
09:55 AM
by
VidyaSargur
Hi there!!!
I am using Hadoop on Cloudera and I was trying to do an INSERT OVERWRITE of a table to another.
The problem:
Table1 has 1561 rows
INSERT INTO TABLE Table2 SELECT * from Table1;
After this I have 1715 rows on Table2.
I have also tried to create table2 as:
CREATE TABLE Table2 AS SELECT * from Table1;
After this I have 1715 rows on Table 2 as well.
I have checked and the different values are all duplicates
select count(*)
from Table2
group by all_columns
having count(*) > 1;
154 duplicated rows.
Why is this happening?
I have tried these using Hive and Impala.
Can someone help me to solve this?
Created 09-20-2019 09:47 AM
I can't really conceive of any explanation other than that the duplicate rows existed in the original table.