Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop lastmodified - custom upper bound

avatar
Explorer

From what I understand, for the lastmodified update method, Sqoop selects records where timestamp_column >= last modified timestamp and timestamp column < current_time. Is there a way to customize that current_time upper bound? Can I do something like current_time - 1 hour? 

 

We have transactions being created on one server, then replicated to another server, then Sqooped from there. I noticed some missing data in our cluster today, and suspect replication delay as the root cause.

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

View solution in original post

2 REPLIES 2

avatar
Rising Star

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

avatar
Explorer