Reply
New Contributor
Posts: 4
Registered: ‎03-28-2017

How to take updated records along with incremental import from RDBMS table to a Hive table?

[ Edited ]

Im working with Sqoop incremental import by taking the data everyday into my hive table. I have the following scenario: I have an RDBMS table: 'empdata' with columns

id  name    city
1   Sid     Amsterdam
2   Bob     Delhi
3   Sun     Dubai
4   Rob     London

I am importing the data into Hive, using Sqoop incremental import through a cron job which shell script to do the work.

 

#!/bin/bash
DATE=$(date +"%d-%m-%y")
while IFS=":" read -r server dbname tablename; do
sqoop import --connect jdbc:mysql://$server/$dbname --table $tablename --username root --password cloudera --hive-import --hive-table dynpart --hive-partition-key 'thisday' --hive-partition-value $DATE --target-dir '/user/hive/newimp5' --incremental-append --check-column id --last-value $(hive -e "select max(id) from $tablename");
done</home/cloudera/Desktop/MyScripts/tables.txt

The above script for incremental load is working fine. But now I have another requirement, which is to check if there are any updates to previous records. Like if the record:

1 Rob London is updated to 1 Rob NewYork

I need to take that updated record(s) along with the incremental import  but only the updated value should be present in Hive table so that I don't have duplicate values either. Could anyone tell me how can I accomplish it ?

Posts: 258
Topics: 11
Kudos: 40
Solutions: 25
Registered: ‎09-02-2016

Re: How to take updated records along with incremental import from RDBMS table to a Hive table?

Announcements
New Solutions