- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3n.
- Labels:
-
Apache Oozie
-
Cloudera Manager
-
HDFS
Created on ‎08-13-2013 06:55 AM - edited ‎09-16-2022 01:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My input path in s3n,
s3n://xxx-xxx/20130813/08
My oozie configuration show as ,
Created ‎08-28-2013 04:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dvohra wrote:This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.
Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?
Yes unfortunately coordinators currently poll inputs over HDFS alone, which is a limitation. However, writing simple WF actions to work over S3 is still possible.
Yes, WFs should reside on HDFS, as Oozie views it as its central DFS. Similar to how MR requires a proper DFS to run. But this shouldn't impair simple I/O operations done over an external FS such as S3.
I think Romain has covered the relevant JIRAs for tracking removal of this limitation.
Created ‎08-13-2013 07:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The link hdfs://xxx.internal:8020/s3n://xxx-xxx/20130813/08 requires a login.
Created ‎08-13-2013 07:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, my question is through Hue in cloudera manager i'm running the oozie job .And I can able to access the hdfs,my question is to connect the another instance Amazon as s3n://xxx to connect ..
Created ‎08-13-2013 09:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The input path is required to be to HDFS, not S3. S3 is not the same as HDFS.
Created ‎08-18-2013 01:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dvohra wrote:The input path is required to be to HDFS, not S3. S3 is not the same as HDFS.
This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.
Created ‎08-19-2013 03:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.
Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?
Created ‎08-28-2013 04:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dvohra wrote:This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.
Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?
Yes unfortunately coordinators currently poll inputs over HDFS alone, which is a limitation. However, writing simple WF actions to work over S3 is still possible.
Yes, WFs should reside on HDFS, as Oozie views it as its central DFS. Similar to how MR requires a proper DFS to run. But this shouldn't impair simple I/O operations done over an external FS such as S3.
I think Romain has covered the relevant JIRAs for tracking removal of this limitation.
Created ‎01-20-2014 09:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎01-20-2014 09:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If it is urgent you could try to patch your Hue with the fix: https://issues.cloudera.org/browse/HUE-1501?focusedCommentId=19417&page=com.atlassian.jira.plugin.sy...
Created ‎01-20-2014 01:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. However I think my question is a little different than that addresses. I am trying to specify an input or output directory in form "s3://..." in an Oozie workflow itself (as an input to a hadoop map reduce job). Do you know if this should work? I get an error that says the path can't have "s3" in it.
