Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to move only newly inserted or updated records alone from hive stage table into another hive table

New Contributor

In Hive i have a staging table 'Stg_emp' . Below is sample records for stg_emp table.

Empid Ename Last_date

----- ----- ---------

1 Eliza 19/02/2018 10:45:02

2 Mike 18/02/2018 09:45:04

3 James 18/02/2018 08:25:01

For this staging table, i will get newly inserted rows through Sqoop_job1 and Updated rows through Sqoop_job2



sqoop job --create sqoop_job1 --import --connect jdbc:oracle:thin:@ --username sq_user --password sq_us --table Stg_emp --incremental append -check-column empid --target-dir /user/sq_sample/stg_emp -m 1 --merge-key empid



sqoop job --create sqoop_job2 --import --connect jdbc:oracle:thin:@ --username sq_user --password sq_us --table Stg_emp --incremental lastmodified -check-column last_date --target-dir /user/sq_sample/stg_emp -m 1 --merge-key empid

My Questions on above sample table.

1. If row contains Empid 3 is deleted then how will i sink those deletions in hive.

2. If i want to move only newly inserted rows of stg_emp into another hive table , how to do that

3. If i want to move only updated rows of stg_emp into another hive table, how to do that.

Please correct me if my sqoop jobs are wrong.


New Contributor

I got answer for 2nd and 3rd question as below.

merge into emp
using ( select * from stg_emp) sub
on sub.empid = emp.empid
when matched then update set ename = sub.ename
when not matched then insert values (sub.empid, sub.ename, sub.last_date);
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.