Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Sqoop lastmodified - custom upper bound

SOLVED Go to solution

Sqoop lastmodified - custom upper bound

Explorer

From what I understand, for the lastmodified update method, Sqoop selects records where timestamp_column >= last modified timestamp and timestamp column < current_time. Is there a way to customize that current_time upper bound? Can I do something like current_time - 1 hour? 

 

We have transactions being created on one server, then replicated to another server, then Sqooped from there. I noticed some missing data in our cluster today, and suspect replication delay as the root cause.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Sqoop lastmodified - custom upper bound

Contributor

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

2 REPLIES 2

Re: Sqoop lastmodified - custom upper bound

Contributor

@Rekonn

At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement.

For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

Highlighted

Re: Sqoop lastmodified - custom upper bound

Explorer