08-25-2016 01:47 PM
From what I understand, for the lastmodified update method, Sqoop selects records where timestamp_column >= last modified timestamp and timestamp column < current_time. Is there a way to customize that current_time upper bound? Can I do something like current_time - 1 hour?
We have transactions being created on one server, then replicated to another server, then Sqooped from there. I noticed some missing data in our cluster today, and suspect replication delay as the root cause.
09-01-2016 08:32 PM
At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.
For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.