Created 02-28-2016 04:59 PM
Created 02-28-2016 05:01 PM
Created 02-28-2016 05:02 PM
@Mahesh Deshmukh Tutorial http://hadooped.blogspot.in/2013/05/apache-sqoop-for-data-integration.html
Go to
Created 02-28-2016 06:42 PM
Thanks Neeraj
Created 02-29-2016 12:07 AM
@Mahesh Deshmukh The 2nd link has step by step demo.
Created 02-28-2016 08:29 PM
The Sqoop merge tool allows you to combine two datasets where entries in one dataset should overwrite entries of an older dataset.
For example, an incremental import run in last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset.
The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.
# Lets Create a TEST Database in MySQL
create database test;
use test;
# Lets Create an Employee Table
create table emp(empid int not null primary key, empname VARCHAR(20), age int, salary int, city VARCHAR(20),cr_date date);
##Describe table;
mysql> describe emp;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| empid | int(11) | NO | PRI | NULL | |
| empname | varchar(20) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| city | varchar(20) | YES | | NULL | |
| cr_date | date | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
##Load the Employee table
LOAD DATA LOCAL INFILE '/Users/gangadharkadam/Downloads/empdata.csv'
INTO TABLE emp
FIELDS TERMINATED BY ','
ENCLOSED BY '/'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(empid, empname, age, salary, city, @var1)
set
cr_date = STR_TO_DATE(@var1, '%m/%d/%Y');
# import the the emp table to hdfs using below command
sqoop import --connect jdbc:mysql://localhost/TEST --table emp --username hive -password hive --target-dir /sqoop/empdata/
# Update the few records in the TEST.emp table as below;
update emp set cr_date='2016-02-28' where empname like "A%";
# Now Merge these updated record with the HDFS file using --merge-key option
#merge tool will "flatten" two datasets into one
sqoop import --connect jdbc:mysql://localhost/test --table emp \
--username hive -password hive --incremental lastmodified --merge-key empid --check-column cr_date \
--target-dir /sqoop/empdata/
Below are some of the updated record with the
750,Anne Ward,57,99496,La Sierpe,2016-02-28
38,Anne Morrison,36,53358,San Pedro Carchá,2016-02-28
718,Adam Ford,56,98340,Arthur’s Town,2016-02-28
Created 02-28-2016 10:18 PM
Hi @Mahesh Deshmukh, check here for one more example of incremental import with merge, including command line output. Note that the column in your DB table you use as "--merge-key" has to be unique. During incrementa import with merge, Sqoop will run 2 MR jobs: The first one is the standard Sqoop Map-only job to import new (updated) records, the second one is the merge MR job. During the merge MR, Sqoop will look for records with the same "merge-key" like the following ones:
150 mary Feb-10 ... already in HDFS 150 john Feb-21 ... newly imported using incremental import
and will set this record to the newer one of Feb-21 (assuming the first column is your "merge-key" and the 3rd column is your "check-column").
Created 09-13-2018 03:01 AM
@Neeraj Sabharwal,
What if the Source table inside RDBMS has composite primary key columns ? Does SQOOP -- Merge-Key command support to mention the composite PK column/s ?
--
Lokesh