Support Questions

Find answers, ask questions, and share your expertise

Sqoop Incremental scenairo

avatar
New Contributor

Hi, I have table Service Requests (SRs) which is having 10k records as of today trying to implement incremental logic so that when i run tomorrow need to capture only newly inserted and updated records

Table structure : row_id (unique), SR# , Sr_owner,sr_group,sr_last_update_date (timestamp)

Approache :

1) created sqoop job as below

sqoop job --create sr_table -- import --connect "jdbc:sqlserver://localhost:1431;database=test;username=root;password=welcome1" --query 'select * from  sr_table where $CONDITIONS' --target-dir /data/sr/sr_table --append --check-column SR_LAST_UPD --incremental lastmodified --last-value '1900-01-01' --split-by ROW_ID

2) While running this for the first time its working fine (Full data or full load)

3) If I re-execute it again its not comparing he Upper Bound value with last run time (SR_LAST_UPD)

Was able to follow this (https://github.com/abajwa-hw/single-view-demo/blob/master/singleview-mysql-advanced-23.md)

I'm looking for more dynamic way of comparing the sqoop job execution time to compare with sr_last_upd value

Thanks -

1 ACCEPTED SOLUTION

avatar
@saichand varanasi

For next run, you need to modify last-value to last-value returned from first run. See https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421 section 7.2.7 on how to do incremental imports. You can use a sqoop job or Oozie that can automate this process.

View solution in original post

4 REPLIES 4

avatar
@saichand varanasi

For next run, you need to modify last-value to last-value returned from first run. See https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421 section 7.2.7 on how to do incremental imports. You can use a sqoop job or Oozie that can automate this process.

avatar
New Contributor

@Shishir , Thanks for your reply ...

1) Saving the job with 1900-01-01 (first time)

2) Save the same job with last-value should solve the problem (modifying same job)?

sqoop job --create sr_table --import--connect "jdbc:sqlserver://localhost:1431;database=test;username=root;password=welcome1"--query 'select * from  sr_table where $CONDITIONS'--target-dir /data/sr/sr_table --append --check-column SR_LAST_UPD --incremental lastmodified --last-value  last-value --split-by ROW_ID

avatar
Master Guru

Hi @saichand varanasi, Sqoop incremental import works either in the append mode or in the lastmodified mode, but you have specified both on your command line. Check here for command specs, and here for a lastomodified Sqoop job sample. Sqoop job will keep track of the "last-value" for you, no need to do that manually.

avatar
New Contributor

Thanks @Shishir Saxena and @Predrag Minovic

@Predrag Minovic Thank you but if I don't use any append or merge-key its throwing error as below

With the below job definition sqoop is able to pick the date dynamically hopefully this should not be a problem?

ERROR tool.ImportTool: Error during import: --merge-key or --append is required when using --incremental lastmodified and the output directory exists.
sqoop job --create SRs-- import --connect "jdbc:sqlserver://localhost:1431;database=xx;username=root;password=welcome1" --query 'select * from  sr_table where $CONDITIONS' --target-dir /data/sr/sr_table --append --check-column LAST_UPD --incremental lastmodified --last-value '2016-03-03 15:56:47.92' --split-by ROW_ID