Reply
Explorer
Posts: 12
Registered: ‎05-20-2016
Accepted Solution

Sqoop lastmodified - custom upper bound

From what I understand, for the lastmodified update method, Sqoop selects records where timestamp_column >= last modified timestamp and timestamp column < current_time. Is there a way to customize that current_time upper bound? Can I do something like current_time - 1 hour? 

 

We have transactions being created on one server, then replicated to another server, then Sqooped from there. I noticed some missing data in our cluster today, and suspect replication delay as the root cause.

 

 

Cloudera Employee
Posts: 35
Registered: ‎08-18-2014

Re: Sqoop lastmodified - custom upper bound

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

Highlighted
Explorer
Posts: 12
Registered: ‎05-20-2016

Re: Sqoop lastmodified - custom upper bound

Announcements