- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sqoop import with warehouse-dir argument
- Labels:
-
Apache Hive
-
Apache Oozie
-
Apache Sqoop
-
HDFS
Created on 05-25-2018 10:40 AM - edited 09-16-2022 06:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using --warehouse-dir argument for loading data in HDFS before sqoop puts it into hive. I am running all my sqoop jobs through oozie.
Now, if the task fails for some reason, it is reattempted and the problem here is that the warehouse dir created by previous task is still there and the task re-attempt fails with error : output directory already exists.
I understand I could use direct argument to skip intermediate loading in HDFS step but I need to use drop import hive delims argument as well and that's not supported with Hive. Advice, please? It's important.
Created 05-29-2018 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
using --delete-target-dir argument worked for me
Created 05-28-2018 01:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a normal behavior.
You should either create dynamic folder name (e.g. output_dir_timestamp) but you may end up having a lot of directories, or add an HDFS action to delete the HDFS directory, just before the sqoop action. I recomend the last approach.
Created 05-29-2018 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
using --delete-target-dir argument worked for me