Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop import with warehouse-dir argument

SOLVED Go to solution

Sqoop import with warehouse-dir argument

Expert Contributor

I am using --warehouse-dir argument for loading data in HDFS before sqoop puts it into hive. I am running all my sqoop jobs through oozie.

Now, if the task fails for some reason, it is reattempted and the problem here is that the warehouse dir created by previous task is still there and the task re-attempt fails with error : output directory already exists.

I understand I could use direct argument to skip intermediate loading in HDFS step but I need to use drop import hive delims argument as well and that's not supported with Hive. Advice, please? It's important.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Sqoop import with warehouse-dir argument

Expert Contributor

using --delete-target-dir argument worked for me

2 REPLIES 2

Re: Sqoop import with warehouse-dir argument

Expert Contributor

This is a normal behavior.

You should either create dynamic folder name (e.g. output_dir_timestamp) but you may end up having a lot of directories, or add an HDFS action to delete the HDFS directory, just before the sqoop action. I recomend the last approach.

Highlighted

Re: Sqoop import with warehouse-dir argument

Expert Contributor

using --delete-target-dir argument worked for me