Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can Oozie curl from an external API and write to HDFS?

avatar
Explorer

I'm trying to create a workflow with a command shell action in Oozie and am running into some issues... in the first stage of the workflow, I'm attempting to curl from a local path on an edge node (on a HWX cluster) to an external API and then write results directly into hdfs. However, that action fails with a super not helpful "Launcher ERROR, reason: Main class [org.apache.action.hadoop.ShellMain], exit code [1]" exception...

My guess is that either Oozie can't see both the local path (to call the curl) and hdfs (to write out results) or that curl (via the -o argument) won't support writing to hdfs... any ideas?

Thanks- Kevin

1 ACCEPTED SOLUTION

avatar
Master Guru

As you say this could be anything. In general a shell action in oozie works like any other shell action however you need to upload all the files you need using file tags since the action could be executed on any datanode in a yarn tmp folder. If you want to interact with a kerberized cluster you need to run a kinit in the shell command. ( you also need to upload the keytab to the temp folder of the shell action using a <file> tag. )

But I think first you need to find out what the problem is: You need to look at the logs of the Map Task executing the shell action. You will not find this in the oozie logs.

Also it doesn't sound like you have tested your shell script locally so that would be the first step of ALL. I.e. Make sure that your script runs successfully in a local environment including curl and all.

Good luck.

View solution in original post

2 REPLIES 2

avatar
Master Guru

As you say this could be anything. In general a shell action in oozie works like any other shell action however you need to upload all the files you need using file tags since the action could be executed on any datanode in a yarn tmp folder. If you want to interact with a kerberized cluster you need to run a kinit in the shell command. ( you also need to upload the keytab to the temp folder of the shell action using a <file> tag. )

But I think first you need to find out what the problem is: You need to look at the logs of the Map Task executing the shell action. You will not find this in the oozie logs.

Also it doesn't sound like you have tested your shell script locally so that would be the first step of ALL. I.e. Make sure that your script runs successfully in a local environment including curl and all.

Good luck.

avatar
Explorer

Thanks Benjamin! The script definitely works locally, and I should have mentioned that. This effort is really just to get the thing fully automated and running without a man in the loop.

As for the logs, you are absolutely right... I was able to figure out how to cross reference the Oozie web ui "logs" with the actually Yarn logs via the map ID. Digging into these and talking to some folks, I was able to figure out how to get the curl to run as shell action in hdfs and write out to hdfs.

Thanks for the assist!

-Kevin