Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Compare two hive tables after creating hash value of primary and non primary keys and list out unique id with update, insert and delete?

Compare two hive tables after creating hash value of primary and non primary keys and list out unique id with update, insert and delete?

New Contributor

I have two tables with almost 200 columns, one column defined as primary key while others (199) non primary key. I have to compare both tables and need to list out primary key (Unique Id) and changes (These changes are Update, delete and Insert) after comparing hask key of non primary key of both tables.

Table Name Customer partitioned by date.

create table customer (customer_id int, cus_name string, cus_phn string, cus_zip int, cus_age int, cus_account int ....................... 200 columns) partitioned on transaction_date.

 

I have to compare data of transaction_date 01-01-2020 and 02-01-2020 and find out customer_id , cdc_flag [update or Insert or delete]

lets say, 0n 2nd jan, existing customer (customer_id 101) changes contact number, while comparing both the transaction_date, it should reflect (101 Update), other example, lets say customer_id 203 deletes account on 2nd Jan then, it should reflect (203 Delete).

Don't have an account?
Coming from Hortonworks? Activate your account here