Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop incremental import when the source table has separate created and updated columns

Sqoop incremental import when the source table has separate created and updated columns

Rising Star

I have to import daily data from SQL Server using Sqoop (1.4.6). Naturally, incremental imports would seem to be the way to go, but here's the thing - the source tables I have to import have separate 'created' and 'updated' columns. That is, newly-created records only have a timestamp value under the 'created' column, and records will only have a value for the 'updated' column if they are edited anytime after first creation. Looking through the documentation, it doesn't look like it's possible to have 2 check columns when doing incremental imports, so the only way I've managed to get this done is with 2 separate imports:

  1. Incremental import with the primary key as check column for NEW records
  2. Incremental import with the updated column as check column for UPDATED records

This works, but I wonder if there's a better way to do this. Any thoughts?

Update: I tried the suggestion by @Sourygna Luangsay to use free-form query imports, however the documentation quite clearly states that the query can't contain OR conditions in the WHERE clause. Besides, since these will be incremental imports, the output directory would exist and I'd still need a two-step workflow: the first step for importing, then the second for merging.

5 REPLIES 5

Re: Sqoop incremental import when the source table has separate created and updated columns

Expert Contributor

To do that with a single query, you could do "manually" the incremental with a custom query in your Sqoop command. Something like that:

sqoop ... -query "select * from sourceTable where created >= yesterday or updated >= yesterday"

Re: Sqoop incremental import when the source table has separate created and updated columns

Rising Star

I'd thought of that actually, though I try to avoid hard-coding SQL whenever I use Sqoop. To me, it feels a bit hacky. It'd be interesting to hear people's thoughts on that, however.

Re: Sqoop incremental import when the source table has separate created and updated columns

Rising Star

Just went through the documentation though, and there's a note saying free-form queries can't have OR in the WHERE clause?

Re: Sqoop incremental import when the source table has separate created and updated columns

Expert Contributor

Give it a try.

I've just done a quick test and the OR condition worked for me (although I have to admit that I currently use the Sqoop Teradata driver, which is quite special).

Re: Sqoop incremental import when the source table has separate created and updated columns

Rising Star

I did try it before I updated my post, and the OR condition doesn't work for me. Only the condition on the left side of the OR seems to be evaluated. For example, in the condition "where created >= yesterday or updated >= yesterday" - only "created >= yesterday" is evaluated. I tried reversing the order of the conditions giving me "where updated >= yesterday or created >= yesterday" - and in this case only "updated >= yesterday" is evaluated.

Don't have an account?
Coming from Hortonworks? Activate your account here