Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Incrementally ETL to Apache Hive

avatar
Explorer

Hi everyone,

 

I want to Make a ETL from RDBMS to Apache Hive and I am using the below approach:

 

Source Data --> Hive Staging Table --> Hive Table

First I Load data into Hive staging table incrementally with date column (max date value stored in metadata table in hive), and using a Merge statement with my table and staging table.

 

Any other recommended approach?

 

 

1 REPLY 1

avatar
Master Collaborator

@Asim- Unless your final table has to be a Hive managed(acid) table then, you could incrementally update the Hive table directly using Sqoop.

e.g.

sqoop import --connect jdbc:oracle:thin:@xx.xx.xx.xx:1521:ORCL --table EMPLOYEE --username user1 --password welcome1 --incremental lastmodified --merge-key employee_id --check-column emp_timestamp --target-dir /usr/hive/warehouse/external/empdata/

Otherwise, the way you are trying is the actually the way Cloudera recommends it.