Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop incremental import using query in Teradata

Highlighted

Sqoop incremental import using query in Teradata

Rising Star

Is an incremental import using sqoop query support loading data in same directory? I am getting below error

Caused by: com.teradata.connector.common.exception.ConnectorException: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/aps/incr already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at com.teradata.connector.common.ConnectorOutputFormat.checkOutputSpecs(ConnectorOutputFormat.java:47)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)

Sqoop Command I am running :

sqoop import --connection-manager org.apache.sqoop.teradata.TeradataConnManager --connect jdbc:teradata://**.***.***.***/DATABASE=**** --username **** --password **** --target-dir /user/aps/incr --query "select * from EMP2 where CREATE_TIME > '2016-07-14 07:06:00' AND \$CONDITIONS" -m 1

Above query is running fine for 1st time. I am getting above error if I run same query twice changing the where clause parameter.

Sqoop incremental option (append or lastmodified ) can store data in same directory but for teradata manager sqoop incremental option does not support . Is the any strategy we can take for achieve my goal ? Please Help.

5 REPLIES 5

Re: Sqoop incremental import using query in Teradata

@Arkaprova Saha

Because the "/user/aps/incr" already exist after first run.

You should try changing "--target-dir /user/aps/incr" on next run.

http://techpost360.blogspot.in/2015/09/hadoop-file-already-exists-exception.html

Highlighted

Re: Sqoop incremental import using query in Teradata

Rising Star

@jss Thank you for your quick response . If I change to different directory How can I marge 2 sets of data in different directory ?

Highlighted

Re: Sqoop incremental import using query in Teradata

@@Arkaprova Saha

The default implementation will not allow you to merge the output. It will check if the directory already exist then it will throw the same exception: [org.apache.hadoop.mapred.FileAlreadyExistsException:Output directory /user/aps/incr already exists]

If you do not want that behaviour then you will need to write your own custom implementation of "FileOutputFormat" as following:

http://johnnyprogrammer.blogspot.in/2012/01/custom-file-output-in-hadoop.html

Highlighted

Re: Sqoop incremental import using query in Teradata

@Arkaprova Saha

Checkout this article and use Oozie to stitch together the different steps. Much has changed in Hive since that article has been written so you maybe able optimize the reconciliation step.

Highlighted

Re: Sqoop incremental import using query in Teradata

Rising Star

@Vladimir Zlatkin

Thanks for your response. Can we apply this four step strategy for HDFS import only? I do not need hive at this moment. Please suggest.

Don't have an account?
Coming from Hortonworks? Activate your account here