Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Mysql to Hive with incremental column

Explorer

I am trying to learn some basic things in sqoop and I want to insert some data from a mysql table  into hive. This Mysql table takes data every 5 mins. I found how to create sqoop job in order to connect and run the query but I can not understand how the sqoop will know the last-value from the primary key column in order to extract the newer data every time. 

 

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

 

For example in the below sqoop command do I have to put the last value or the sqoop can understand it from its own?

The check-column must be the primary key column?

 

sqoop job --create <JOBS NAME>\
--import \
--connect "jdbc:<PATH>" \
--username <USERNAME> \
--password <PASSWORD> \
--target-dir <DIR> \
--table <MYSQL TABLE>\
--hive-import \
--hive-table <HIVE TABLE>\
--fields-terminated-by , \
--escaped-by \\ \
--split-by <COLUMN TO BE SPLITED IN MAPPERS> \
--num-mappers -5 \
--incremental append \
--check-column \
--last-value

 

 

0 REPLIES 0