Support Questions

Find answers, ask questions, and share your expertise

In Sqoop incremental lastmodified, getting duplicate records?


Im importing table from DB2 to hcatalog with lastmodified option in ORC formate, Sometimes I am getting duplicate records, some tables are iporting properly, but soe tbales are getting dupliates,What might be the problem?

Thank U


Hi @Ravikiran Dasari!
Could you share your sqoop call?
Btw, not sure it's your case, but, once I had a similar problem, got that solved by passing the correct timestamp to the --last-value.
Also, could you confirm if these "duplicate records" appears in a determined window of timestamps or #number of mappers?
"Only happens if I set more than 4 (default) mappers and the timestamp for the duplicated records are close to the value of --last-value. "

Hi @Vinicius Higa Murakami,

Thanks for response..

--last-modifird value will take it from Sqoop job only,if I give manually there wont be any issue,In my source DB new records will add at 2018-07-29 01:20:08 and my sqoop import has run at 2018-07-29 15:10:08.234980.And again my source import will be at 2018-07-30 01:30:08 and my sqoop import will run at 2018-07-30 14:10:08.234980, this it will import 2018-07-29 source import records and 2018-07-30 import records also, and its not every time some times its importing 2018-07-30 import records only. My import statement is as follows

sqoop job --create PACKAGE_EVENT_AUTOBOOST_SETUP_AMOUNT_JOB -- import --options-file '/home/hdfs/sqoopimport/DBConnections/connectionDetails.txt' --password-file 'hdfs://' --table REPORT.PACKAGE_EVENT_AUTOBOOST --incremental lastmodified --check-column LOAD_AT -m 1 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database SNDPD --hcatalog-table report_PACKAGE --hcatalog-storage-stanza 'stored as orcfile'.


Hello @Ravikiran Dasari!
Okay. Are these rows suffering from updates on the --check-column LOAD_AT? If so, they will be imported only if the value it's bigger than the --last-value or the value saved on the sqoop job, otherwise only new rows should be imported.
One thing that you can take a look is:
To merge your datasets to maintain the PK with last recent register 🙂
Hope this helps!

Hi @Vinicius Higa Murakami,

There is no updates, I am getting full row duplicates ,I think its sqoop tool problem.And to make use of merge I dont have PK in table.

Sqoop import with last modifies is not giving consistence result.Some times its importing hole day records instead of importing records from last import .

Anyway Thanks a lot.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.