Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop Incremental Append

Sqoop Incremental Append

New Contributor

My database has a timestamp column on the basis of which Iam performing my sqoop incremental import with last modified clause.

But

If I am giving last modified check column value as 11am it doesn't retrieves the records which were inserted at 11am it imports records after that.

How do I import the records processed at 11am.

I don't want to have any duplicate records or any missing records.

8 REPLIES 8

Re: Sqoop Incremental Append

Champion

its is always recommended to run this as sqoop job so that you will have your last value being recorded automatically. 

 

would you consider performing 

 

--incremental append 

with 

--check-column

 which specifies the column to be examined when determining which rows to import.

 will insert all the new rows based on the last value

Re: Sqoop Incremental Append

Explorer

Dear All.

I have table in sql server that column contain random unique number there is no any primary key but we want to perform incremental append or lastmodified operation using sqoop so please help me.

 

 

 

Note:-This is Critical Issue.

 

 

Thanks

HadoopHelp

Re: Sqoop Incremental Append

Champion

You can perform lastmodified option.

Something like the below 

sqoop import \
--connect 
--username
--password 
--table 
--incremental lastmodified \
--check cloumn last_updated_date_or anything that is according to your table
--last--vaule " 2101-02-22 01:02:12"

Re: Sqoop Incremental Append

Explorer

Thanks!

 

but there is no any time/date column in my table?

then how can we perform last-modified operation?

 

 

 

 

Thanks

HadoopHelp

Re: Sqoop Incremental Append

Rising Star

I recall you don't need the column to be a date, but for squoop to know which records are added/changed after the point where you already got, you do need to have something incremental.

 

If you have no column that can be easily used to determine whether a row is newer or not, the only conceptual way to know whether a row is new, would be by keeping track of which values have already been loaded. This administration is very heavy something that tools like sqoop cannot do automatically.

Re: Sqoop Incremental Append

Explorer

Hi @DennisJaheruddi ,

 

Thanks for replied.

 

Problem is here to identify the newly added row inside sqoop as well as hdfs/hive.because whatever we are

 

getting that data from sqlserver there ie no any unique key value i.e each column contains duplicates data in a

 

random manner.i tried more to find out this solution but there is no luck from my sode.

 

 

 

Thanks and appreciated if you find some solution for this issue.

 

 

Thanks

HadoopHelp

Re: Sqoop Incremental Append

Rising Star
If you don't have a way to logically determine what data is new, it is conceptually impossible to only import the new data.

The only pattern that I can think of here is:

1. Load in all data on day 1 with Sqoop
2. Load in all data on day 2 with Sqoop
3. Use logic outside Sqoop to determine what your result should be after day 2 (e.g. By using Spark)

It means you need to load a lot of duplicate data, this is not a limitation of the loading tool, but an inherent consequence of not having an updatetimestamp in your data model.

Re: Sqoop Incremental Append

Champion

Let me know if you need any more information.