Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3n.

Solved Go to solution

Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3n.

Explorer

My input path in s3n,

 

s3n://xxx-xxx/20130813/08

 

My oozie configuration show as ,

 

hdfs://xxx.internal:8020/s3n://xxx-xxx/20130813/08

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Master Guru

@dvohra wrote:

This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.

 

Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?


Yes unfortunately coordinators currently poll inputs over HDFS alone, which is a limitation. However, writing simple WF actions to work over S3 is still possible.

 

Yes, WFs should reside on HDFS, as Oozie views it as its central DFS. Similar to how MR requires a proper DFS to run. But this shouldn't impair simple I/O operations done over an external FS such as S3.

 

I think Romain has covered the relevant JIRAs for tracking removal of this limitation.

19 REPLIES 19

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Rising Star

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Explorer

Sorry, my question is through Hue in cloudera manager  i'm running the oozie job .And I can able to access the hdfs,my question is to connect the another instance  Amazon  as s3n://xxx  to connect ..

Highlighted

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Rising Star

The input path is required to be to HDFS, not S3. S3 is not the same as HDFS.

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Master Guru

@dvohra wrote:

The input path is required to be to HDFS, not S3. S3 is not the same as HDFS.


This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Rising Star

This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.

 

Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Master Guru

@dvohra wrote:

This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.

 

Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?


Yes unfortunately coordinators currently poll inputs over HDFS alone, which is a limitation. However, writing simple WF actions to work over S3 is still possible.

 

Yes, WFs should reside on HDFS, as Oozie views it as its central DFS. Similar to how MR requires a proper DFS to run. But this shouldn't impair simple I/O operations done over an external FS such as S3.

 

I think Romain has covered the relevant JIRAs for tracking removal of this limitation.

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

New Contributor
Is there any updates to this Jira item? Is there a way now to specify an S3n:// location for input or output directories in Oozie workflow without the location being prepended by "hdfs"?

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

Yes, https://issues.cloudera.org/browse/HUE-1501 is in Hue 3.5 or CDH5 beta 2 in one month (and CDH5 in 2 months).

If it is urgent you could try to patch your Hue with the fix: https://issues.cloudera.org/browse/HUE-1501?focusedCommentId=19417&page=com.atlassian.jira.plugin.sy...

Re: Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3

New Contributor

Thank you.  However I think my question is a little different than that addresses.  I am trying to specify an input or output directory in form "s3://..." in an Oozie workflow itself (as an input to a hadoop map reduce job).  Do you know if this should work? I get an error that says the path can't have "s3" in it.

Don't have an account?
Coming from Hortonworks? Activate your account here