Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is it possible to do an incremental import using Sqoop?

avatar

Hi,

Can anyone know if, Is it possible to do an incremental import using Sqoop? If yes, How?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Rushikesh Deshmukh

See this https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.

The following arguments control incremental imports:

Table 4. Incremental import arguments:

ArgumentDescription
--check-column (col)Specifies the column to be examined when determining which rows to import.
--incremental (mode)Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)Specifies the maximum value of the check column from the previous import.

Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform.

You should specify append mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with --check-column. Sqoop imports rows where the check column has a value greater than the one specified with --last-value.

An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported.

At the end of an incremental import, the value which should be specified as --last-value for a subsequent import is printed to the screen. When running a subsequent import, you should specify --last-value in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Rushikesh Deshmukh

See this https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.

The following arguments control incremental imports:

Table 4. Incremental import arguments:

ArgumentDescription
--check-column (col)Specifies the column to be examined when determining which rows to import.
--incremental (mode)Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)Specifies the maximum value of the check column from the previous import.

Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform.

You should specify append mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with --check-column. Sqoop imports rows where the check column has a value greater than the one specified with --last-value.

An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported.

At the end of an incremental import, the value which should be specified as --last-value for a subsequent import is printed to the screen. When running a subsequent import, you should specify --last-value in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

avatar

@Neeraj Sabharwal, thanks for quick reply.

avatar

Hi Neeraj Sabharwal,

@Neeraj Sabharwal

@Rushikesh Deshmukh

This are the steps i followed for incremental import in sqoop for hbase table.

Step 1:

Importing a Table To HBase

sqoop import --connect "jdbc:sqlserver://x.x.x.x:1433;database=test" --username sa -P --table employee --hbase-table employee --hbase-create-table --column-family cf --hbase-row-key id -m 1

Step 2:

SQOOP HBASE INCREMENTAL IMPORT

sqoop import --connect "jdbc:sqlserver://x.x.x.x:1433;database=test" --username sa -P --table employee --incremental append --check-column id --last-value 71 -m 1

Step 3:

SQOOP JOB CREATION FOR HBASE INCREMENT

sqoop job --create incjobsnew -- import --connect "jdbc:sqlserver://x.x.x.x:1433;database=test" --username sa -P --table employee --incremental append --check-column id --last-value 71 -m 1.

When i execute sqoop job

sqoop job --exec incjobsnew.

Sqoop command runs successfully and it show the exact number of records retrieved successfully. When i check in hbase for the records. It doesn't show the retrieved results.

Could you tell where is the mistake done.

I need to automate this sqoop job in Oozie to run a particular time interval daily.