Created 03-02-2016 09:07 PM
Hi, I have table Service Requests (SRs) which is having 10k records as of today trying to implement incremental logic so that when i run tomorrow need to capture only newly inserted and updated records
Table structure : row_id (unique), SR# , Sr_owner,sr_group,sr_last_update_date (timestamp)
Approache :
1) created sqoop job as below
sqoop job --create sr_table -- import --connect "jdbc:sqlserver://localhost:1431;database=test;username=root;password=welcome1" --query 'select * from sr_table where $CONDITIONS' --target-dir /data/sr/sr_table --append --check-column SR_LAST_UPD --incremental lastmodified --last-value '1900-01-01' --split-by ROW_ID
2) While running this for the first time its working fine (Full data or full load)
3) If I re-execute it again its not comparing he Upper Bound value with last run time (SR_LAST_UPD)
Was able to follow this (https://github.com/abajwa-hw/single-view-demo/blob/master/singleview-mysql-advanced-23.md)
I'm looking for more dynamic way of comparing the sqoop job execution time to compare with sr_last_upd value
Thanks -
Created 03-03-2016 01:22 AM
For next run, you need to modify last-value to last-value returned from first run. See https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421 section 7.2.7 on how to do incremental imports. You can use a sqoop job or Oozie that can automate this process.
Created 03-03-2016 01:22 AM
For next run, you need to modify last-value to last-value returned from first run. See https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421 section 7.2.7 on how to do incremental imports. You can use a sqoop job or Oozie that can automate this process.
Created 03-03-2016 01:26 AM
@Shishir , Thanks for your reply ...
1) Saving the job with 1900-01-01 (first time)
2) Save the same job with last-value should solve the problem (modifying same job)?
sqoop job --create sr_table --import--connect "jdbc:sqlserver://localhost:1431;database=test;username=root;password=welcome1"--query 'select * from sr_table where $CONDITIONS'--target-dir /data/sr/sr_table --append --check-column SR_LAST_UPD --incremental lastmodified --last-value last-value --split-by ROW_ID
Created 03-03-2016 02:22 AM
Hi @saichand varanasi, Sqoop incremental import works either in the append mode or in the lastmodified mode, but you have specified both on your command line. Check here for command specs, and here for a lastomodified Sqoop job sample. Sqoop job will keep track of the "last-value" for you, no need to do that manually.
Created 03-03-2016 04:59 PM
Thanks @Shishir Saxena and @Predrag Minovic
@Predrag Minovic Thank you but if I don't use any append or merge-key its throwing error as below
With the below job definition sqoop is able to pick the date dynamically hopefully this should not be a problem?
ERROR tool.ImportTool: Error during import: --merge-key or --append is required when using --incremental lastmodified and the output directory exists.
sqoop job --create SRs-- import --connect "jdbc:sqlserver://localhost:1431;database=xx;username=root;password=welcome1" --query 'select * from sr_table where $CONDITIONS' --target-dir /data/sr/sr_table --append --check-column LAST_UPD --incremental lastmodified --last-value '2016-03-03 15:56:47.92' --split-by ROW_ID