Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Sqoop lastmodified - custom upper bound

avatar
Explorer

From what I understand, for the lastmodified update method, Sqoop selects records where timestamp_column >= last modified timestamp and timestamp column < current_time. Is there a way to customize that current_time upper bound? Can I do something like current_time - 1 hour? 

 

We have transactions being created on one server, then replicated to another server, then Sqooped from there. I noticed some missing data in our cluster today, and suspect replication delay as the root cause.

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

View solution in original post

2 REPLIES 2

avatar
Rising Star

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

avatar
Explorer