Support Questions

Ashok · ‎08-13-2013

My input path in s3n,

s3n://xxx-xxx/20130813/08

My oozie configuration show as ,

hdfs://xxx.internal:8020/s3n://xxx-xxx/20130813/08

Harsh J · ‎08-28-2013

@dvohra wrote:
This isn't true. Depending on what you're doing with Oozie, S3 is supported just fine as an input or output location.

Doesn't the coordinator expect the input path to be on HDFS as hdfs://{nameNode} is prepended automatically? The workflow.xml is on the HDFS? Isn't the workflow.xml required to be on the HDFS?

Yes unfortunately coordinators currently poll inputs over HDFS alone, which is a limitation. However, writing simple WF actions to work over S3 is still possible.

Yes, WFs should reside on HDFS, as Oozie views it as its central DFS. Similar to how MR requires a proper DFS to run. But this shouldn't impair simple I/O operations done over an external FS such as S3.

I think Romain has covered the relevant JIRAs for tracking removal of this limitation.

View solution in original post

Romainr · ‎01-21-2014

Yes, it should work, cf. above users using it with no problem. What is your Oozie version?

mikeheldar · ‎01-23-2014

I'm using "oozie-3.3.2-cdh4.5.0". The installation is working fine when input/output paths are specified in form "hdfs://…." but when specified with either "s3://…" or "s3n://" we get the following error in the logs:

Scheme of ʼs3n://...ʼ is not supported.

Where "…" equals path to input/output, verified there, verified working with hadoop commands in console.

Anything you know of I can look at?

mikeheldar · ‎01-23-2014

Nevermind, I got it.

The solution was a mix of upgrading the version (to previously listed version) AND adding the supported filesystems property back in.

thanks!

Romainr · ‎01-24-2014

Glad to hear!

Harsh J · ‎08-18-2013

@Ashok wrote:
My input path in s3n,

s3n://xxx-xxx/20130813/08

My oozie configuration show as ,

hdfs://xxx.internal:8020/s3n://xxx-xxx/20130813/08

Can you share your workflow.xml for us to validate?

If you're passing an S3 input or output path, simply ensure your workflow does not template it as ${nameNode}/${input} or something like that. That way you're prepending a HDFS URI to your already-an-uri path. This could most likely be your issue.

Ashok · ‎08-18-2013

In coordinator jobs i'm passing the dataset uri template as

s3n://xxx-xxx/${YEAR}${MONTH}${DAY}/${HOUR}

and coord:dataOut as

<property>
<name>in_folder</name>
<value>${coord:dataOut('in_folder')}</value>
</property>

and my workflow.xml input as

${in_folder}

when I submit the coordinator job it automatically preappend the configuration like:

${nameNode}s3n://xxx-xxx/${YEAR}${MONTH}${DAY}/${HOUR}

Romainr · ‎08-18-2013

Good to know, Hue Coodinators are prepended only with hdfs.

Is https://issues.apache.org/jira/browse/OOZIE-426 finished?

Ashok · ‎08-19-2013

FWIW, the same job works fine as a workflow when submitted via Hue. In this case, we manually pass the input (S3) and output (hdfs) locations and the job runs successfuly - thus establishing that the problem is not with S3 support. The problem is when we let the co-ordinator pass this input (via a computed datasource) does it automatically prepend hdfs://{nameNode} in front of the s3n://<> URI. Hope this clarifies.

Romainr · ‎08-19-2013

Ok this clarifies a lot! I updated https://issues.cloudera.org/browse/HUE-1501.

Ashok · ‎08-19-2013

Thanks. Is this considered as a bug? If yes, what are some workarounds that we can follow for now? Any help is appreciated.

Cloudera Community

Support Questions

Problem in oozie input path . How do I configure the oozie in cloudera manager,input path as s3n.