I am using Hadoop on Cloudera and I was trying to do an INSERT OVERWRITE of a table to another.
Table1 has 1561 rows
INSERT INTO TABLE Table2 SELECT * from Table1;
After this I have 1715 rows on Table2.
I have also tried to create table2 as:
CREATE TABLE Table2 AS SELECT * from Table1;
After this I have 1715 rows on Table 2 as well.
I have checked and the different values are all duplicates
group by all_columns
having count(*) > 1;
154 duplicated rows.
Why is this happening?
I have tried these using Hive and Impala.
Can someone help me to solve this?
I can't really conceive of any explanation other than that the duplicate rows existed in the original table.