Support Questions
Find answers, ask questions, and share your expertise

How to take updated records along with incremental import from RDBMS table to a Hive table?


Im working with Sqoop incremental import by taking the data everyday into my hive table. I have the following scenario: I have an RDBMS table: 'empdata' with columns

id  name    city
1   Sid     Amsterdam
2   Bob     Delhi
3   Sun     Dubai
4   Rob     London

I am importing the data into Hive, using Sqoop incremental import through a cron job which shell script to do the work.


DATE=$(date +"%d-%m-%y")
while IFS=":" read -r server dbname tablename; do
sqoop import --connect jdbc:mysql://$server/$dbname --table $tablename --username root --password cloudera --hive-import --hive-table dynpart --hive-partition-key 'thisday' --hive-partition-value $DATE --target-dir '/user/hive/newimp5' --incremental-append --check-column id --last-value $(hive -e "select max(id) from $tablename");

The above script for incremental load is working fine. But now I have another requirement, which is to check if there are any updates to previous records. Like if the record:

1 Rob London is updated to 1 Rob NewYork

I need to take that updated record(s) along with the incremental import  but only the updated value should be present in Hive table so that I don't have duplicate values either. Could anyone tell me how can I accomplish it ?


; ;