Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Where is the output of an Oozie workflow stored?

avatar
Expert Contributor

I am calling a shell action in an oozie workflow. I don't know to access the output location to see the result. Where does Oozie stores the result of the shell action?

1 ACCEPTED SOLUTION

avatar

@Alex Raj

So it appears your calling a Shell action which is expected to produce some output (within file system or hdfs) and you want to see that, is that correct? Or Are you actually wanting to capture the output (echo statements) in the script for the purpose of referencing those values in subsequent steps in the Oozie workflow?

If its the latter, see the response from @Benjamin Leonhardi

If its the former, which I believe you are asking then the answer is (you wont be thrilled) - It depends.

It depends on what the script is doing. I can imagine few scenarios and will talk through that but let us know if you are doing something different in which case, we can talk specific about that. So here is what you MAY be doing in the script -

  • writing to a local file with absolute path
  • writing to a local file with relative path
  • writing to a HDFS file with absolute path

Writing to a local file with absolute path -

Lets say the script does this -

touch /tmp/a.txt

In this case, the output gets created on the local filesystem of nodemanager where the task got executed. There is really no way to tell which one.. so you would have to check all nodes. The good thing is that you know what the absolute path is.

Writing to a local file with relative path -

Lets say the script does this -

touch ./a.txt

In this case, the output gets created on the local filesystem of nodemanager, where the task got executed, but relative to the working temp directory where workflow temporary files are created. There is really no way to tell which note and we may never even see the actual file because usually the temporary files are cleaned up after the workflow is executed. SO if the file is within the subdirectory then it will most likely be deleted.

Writing to a HDFS file with absolute path <- This is the best way to setup the program because you know where to look for output.

Lets say the script does this -

echo "my content" >> /tmp/a.txt
hdfs dfs -put /tmp/a.txt /tmp/a.txt

In this case, the output gets created on HDFS & you know the path. So its easy to find.

If you are not following the last approach, I would recommend that.

Hope this helps.

View solution in original post

9 REPLIES 9

avatar
Master Mentor

Please see this, you need to drill down to the job to see results https://community.hortonworks.com/content/kbentry/9148/troubleshooting-an-oozie-flow.html

avatar
Master Guru

Just in addition to what Artem said ( Oozie stores the output of an action in its launcher logs so you have to drill through the logs ).

If you however want to automatically react to output in the shell action you can do that as well with the capture output tag. Your shell command needs to output a key value pair and oozie will read them and add them to the flow as varialbles. So if your load.sh would do an "echo output=success" at the end the below flow would go to the success if not the flow would be killed.

<action name="load-files">
        <shell xmlns="uri:oozie:ssh-action:0.1">
            ...
            <command>load.sh</command>
             <capture-output/>
        </shell>
        <ok to="check-if-data"/>
        <error to="kill"/>
    </action>
	
	<decision name="check-if-data">
        <switch>
            <case to="end">${ wf:actionData('load-files')['output'] eq 'success'}</case>
            <default to="kill" />
        </switch>
    </decision>

avatar
Expert Contributor
So if your load.sh produce below output
echo "output_1=success_1 output_2=success_2"

How to access success_2 value in switch case?
clarification 1:-"echo output_1=success_1 output_2=success_2"
Is below syntax is correct?
case to="end">${ wf:actionData('load-files')['output_2'] eq 'success_2'}</case>


clarification 2:-
For sqoop action to pass success_2 as a parameter.can we use like this.
Is it correct?
<arg>${ wf:actionData('load-files')['output_2']}</arg>



avatar
Expert Contributor

Hi All,

Any Input on my clarifications?Faced this scenario one more time

avatar

@Alex Raj

So it appears your calling a Shell action which is expected to produce some output (within file system or hdfs) and you want to see that, is that correct? Or Are you actually wanting to capture the output (echo statements) in the script for the purpose of referencing those values in subsequent steps in the Oozie workflow?

If its the latter, see the response from @Benjamin Leonhardi

If its the former, which I believe you are asking then the answer is (you wont be thrilled) - It depends.

It depends on what the script is doing. I can imagine few scenarios and will talk through that but let us know if you are doing something different in which case, we can talk specific about that. So here is what you MAY be doing in the script -

  • writing to a local file with absolute path
  • writing to a local file with relative path
  • writing to a HDFS file with absolute path

Writing to a local file with absolute path -

Lets say the script does this -

touch /tmp/a.txt

In this case, the output gets created on the local filesystem of nodemanager where the task got executed. There is really no way to tell which one.. so you would have to check all nodes. The good thing is that you know what the absolute path is.

Writing to a local file with relative path -

Lets say the script does this -

touch ./a.txt

In this case, the output gets created on the local filesystem of nodemanager, where the task got executed, but relative to the working temp directory where workflow temporary files are created. There is really no way to tell which note and we may never even see the actual file because usually the temporary files are cleaned up after the workflow is executed. SO if the file is within the subdirectory then it will most likely be deleted.

Writing to a HDFS file with absolute path <- This is the best way to setup the program because you know where to look for output.

Lets say the script does this -

echo "my content" >> /tmp/a.txt
hdfs dfs -put /tmp/a.txt /tmp/a.txt

In this case, the output gets created on HDFS & you know the path. So its easy to find.

If you are not following the last approach, I would recommend that.

Hope this helps.

avatar

The last option works fine.

avatar
Master Guru

Hi @Alex Raj, were the answers helpful? If so, please consider to accept one and/or up-vote them.Tnx!

avatar

You can see output in JOB TRACKER

avatar

Running a script as a oozie action, I'm missing (it's not logged anywhere) output from spark log4j and python logging. I can see only stdout from python (print() function) and bash. I did not figured it out why logger dont work in oozie.