Support Questions

Find answers, ask questions, and share your expertise

Incrementally ETL to Apache Hive


Hi everyone,


I want to Make a ETL from RDBMS to Apache Hive and I am using the below approach:


Source Data --> Hive Staging Table --> Hive Table

First I Load data into Hive staging table incrementally with date column (max date value stored in metadata table in hive), and using a Merge statement with my table and staging table.


Any other recommended approach?




Rising Star

@Asim- Unless your final table has to be a Hive managed(acid) table then, you could incrementally update the Hive table directly using Sqoop.


sqoop import --connect jdbc:oracle:thin:@xx.xx.xx.xx:1521:ORCL --table EMPLOYEE --username user1 --password welcome1 --incremental lastmodified --merge-key employee_id --check-column emp_timestamp --target-dir /usr/hive/warehouse/external/empdata/

Otherwise, the way you are trying is the actually the way Cloudera recommends it.