Support Questions

Find answers, ask questions, and share your expertise

Sqoop Incremental import into Hive table

Contributor

Hi All,

I want to perform incremental import using the Sqoop command. I have following data into MySQL table.

mysql> select * from data;
+------+------+------------+
| id   | name | join_date  |
+------+------+------------+
|    1 | aaa  | 2017-11-26 |
|    2 | bbb  | 2018-03-06 |
|    3 | ccc  | 2018-04-25 |
+------+------+------------+
3 rows in set (0.02 sec)

I have imported these MySQL data into hive table.

hive> select * from datahive;
OK
1       aaa     2017-11-26
2       bbb     2017-11-26
3       ccc     2018-04-25
Time taken: 3.876 seconds, Fetched: 3 row(s)
hive>

Now, I am updating one record and inserting a new record in MySQL table for incremental Sqoop import.

mysql> update data set name="new_update" where id = 1;
mysql> insert into data values (5,"new_insert","2018-06-01");

Sqoop command;

sqoop import \
--connect "jdbc:mysql://sandbox.hortonworks.com:3306/sqltempdb" \
--username root \
--password hadoop \
--table data \
--hive-import \
--hive-table datahive \
--fields-terminated-by "," \
-m 1 \
--incremental lastmodified \
--check-column join_date \
--last-value "2017-01-01" 

By executing the above Sqoop command I am getting the following records;

hive> select * from datahive;
OK
1       aaa     2017-11-26
2       bbb     2017-11-26
3       ccc     2018-04-25
1       new_update      2017-11-26
2       bbb     2017-11-26
3       ccc     2018-04-25
5       new_insert      2018-06-01
Time taken: 30.438 seconds, Fetched: 7 row(s)
hive>

But I only want the updated record and newly inserted record into Hive table, I don't want to append all data with existing data.

Required Data into Hive:

1	aaa	2017-11-26
2	bbb	2018-03-06
3	ccc	2018-04-25 
1	new_update	2017-11-26
5	new_insert	2018-06-01

Kindly help me to reach out this solution.

Regards,

Jay.

1 REPLY 1

Maybe you could get more control by sending sqoop your own custom query:

--query "Insert custom query here"