Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie shell action: exec and file tags

avatar
Expert Contributor

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things. There are examples I've seen where there is no <file> tag.

Some example, like in Cloudera here, repeats the shell script in file tag:

<shell xmlns="uri:oozie:shell-action:0.2">
	<exec>check-hour.sh</exec>
        <argument>${earthquakeMinThreshold}</argument>
        <file>check-hour.sh</file>
</shell> 

While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.

<shell xmlns="uri:oozie:shell-action:0.1">
        ...
        <exec>${EXEC}</exec>
        <argument>A</argument>
        <argument>B</argument>
        <file>${EXEC}#${EXEC}</file>
</shell> 

There are also examples I've seen where the path (HDFS or local?) is prepended before the `script.sh#script.sh` within the <file> tag.

<shell xmlns="uri:oozie:shell-action:0.1">
        ...
        <exec>script.sh</exec>
        <argument>A</argument>
        <argument>B</argument>
        <file>/path/script.sh#script.sh</file>
</shell> 

As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides). Can someone explain the differences in these examples and how `<exec>`, `<file>`, `script.sh#script.sh`, and the `/path/script.sh#script.sh` are used?

1 ACCEPTED SOLUTION

avatar
Master Guru

OK the exec tag executes a shell script in the local working directory of oozie.

For example /hadoop/yarn/.../oozietmp/myscript.sh

You have no idea before which directory this is or on which server it is located. It is in some yarn tmp dir.

The file tag is there to put something into this temp dir. And you can rename the file as well using the # syntax.

So if your shell script is in HDFS in hdfs://tmp/myfolder/myNewScript.sh

But you do not want to change the exec tag for some reason.

You can do

<file>/tmp/myfolder/myNewScript.sh#myscript.sh</file>

And oozie will take the file from HDFS put it into the tmp folder before execution and rename it.

You can use the file tag to upload any kind of files ( like jars or other dependencies )

As far as I can see the ${EXEC} is just a variable they set somewhere with no specific meaning.

Oh last but not least, if you want to avoid the file tag you can also simply put these files into a lib folder in the workflow folder. Oozie will upload all of these files per default.

View solution in original post

1 REPLY 1

avatar
Master Guru

OK the exec tag executes a shell script in the local working directory of oozie.

For example /hadoop/yarn/.../oozietmp/myscript.sh

You have no idea before which directory this is or on which server it is located. It is in some yarn tmp dir.

The file tag is there to put something into this temp dir. And you can rename the file as well using the # syntax.

So if your shell script is in HDFS in hdfs://tmp/myfolder/myNewScript.sh

But you do not want to change the exec tag for some reason.

You can do

<file>/tmp/myfolder/myNewScript.sh#myscript.sh</file>

And oozie will take the file from HDFS put it into the tmp folder before execution and rename it.

You can use the file tag to upload any kind of files ( like jars or other dependencies )

As far as I can see the ${EXEC} is just a variable they set somewhere with no specific meaning.

Oh last but not least, if you want to avoid the file tag you can also simply put these files into a lib folder in the workflow folder. Oozie will upload all of these files per default.