Hi All,
I want to perform incremental import using the Sqoop command. I have following data into MySQL table.
mysql> select * from data;
+------+------+------------+
| id | name | join_date |
+------+------+------------+
| 1 | aaa | 2017-11-26 |
| 2 | bbb | 2018-03-06 |
| 3 | ccc | 2018-04-25 |
+------+------+------------+
3 rows in set (0.02 sec)
I have imported these MySQL data into hive table.
hive> select * from datahive;
OK
1 aaa 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
Time taken: 3.876 seconds, Fetched: 3 row(s)
hive>
Now, I am updating one record and inserting a new record in MySQL table for incremental Sqoop import.
mysql> update data set name="new_update" where id = 1;
mysql> insert into data values (5,"new_insert","2018-06-01");
Sqoop command;
sqoop import \
--connect "jdbc:mysql://sandbox.hortonworks.com:3306/sqltempdb" \
--username root \
--password hadoop \
--table data \
--hive-import \
--hive-table datahive \
--fields-terminated-by "," \
-m 1 \
--incremental lastmodified \
--check-column join_date \
--last-value "2017-01-01"
By executing the above Sqoop command I am getting the following records;
hive> select * from datahive;
OK
1 aaa 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
1 new_update 2017-11-26
2 bbb 2017-11-26
3 ccc 2018-04-25
5 new_insert 2018-06-01
Time taken: 30.438 seconds, Fetched: 7 row(s)
hive>
But I only want the updated record and newly inserted record into Hive table, I don't want to append all data with existing data.
Required Data into Hive:
1 aaa 2017-11-26
2 bbb 2018-03-06
3 ccc 2018-04-25
1 new_update 2017-11-26
5 new_insert 2018-06-01
Kindly help me to reach out this solution.
Regards,
Jay.