Support Questions

Sokka · ‎01-24-2024

Hello,

I'm new to using oozie and I've been trying to figure out how to use the output of an action as the input of the next action in Oozie. I'm actually coming from using traditional ETLs, where the output of one Step can be used as arguments in the next Step. My question is whether the following points are possible with Oozie:

Use the output of one action as the input of the next action
Let a subsequent action be executed "n" times according to the output of the first action. Like sub jobs or something.

Actually, in order to know if this is possible, we are trying to run a very basic dump example where we read and insert with two hive actions, one for read a table and another one to write. Something like the following example created with Hue Oozie Editor.

<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="hive-04f4"/>
<kill name="Kill">
<message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="hive-04f4" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-04f4.sql</script>
</hive2>
<ok to="hive-1c24"/>
<error to="Kill"/>
</action>
<action name="hive-1c24" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-1c24.sql</script>
<param>fecha=${fecha_insercion}</param>
<param>registro=${now}</param>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

ChethanYM · ‎02-06-2024

Hi @Sokka , I think its possible, can you try the below?

"""<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="hive-04f4"/>
<kill name="Kill">
<message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>


<action name="hive-04f4" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-04f4.sql</script>

<capture-output/>
</hive2>
<ok to="loop-decision"/>
<error to="Kill"/>
</action>


<decision name="loop-decision">
<switch>

<case to="hive-1c24">${wf:actionData('hive-04f4')['output'] != null}</case>
</switch>

<default to="End"/>
</decision>


<action name="hive-1c24" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-1c24.sql</script>

<param>input=${wf:actionData('hive-04f4')['output']}</param>
</hive2>
<ok to="join"/>
<error to="Kill"/>
</action>


<join name="join" to="loop-decision"/>

<end name="End"/>
</workflow-app>"""

The <decision> node (loop-decision) contains a <switch> element with a single <case> element to check if the output of the first Hive action (hive-04f4) is not null. If it's not null, it proceeds to execute the second Hive action (hive-1c24). If it is null, it goes to the <default> path, which ends the workflow.

Regards,

Chethan YM

View solution in original post

ChethanYM · ‎02-06-2024

Hi @Sokka , I think its possible, can you try the below?

"""<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="hive-04f4"/>
<kill name="Kill">
<message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>


<action name="hive-04f4" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-04f4.sql</script>

<capture-output/>
</hive2>
<ok to="loop-decision"/>
<error to="Kill"/>
</action>


<decision name="loop-decision">
<switch>

<case to="hive-1c24">${wf:actionData('hive-04f4')['output'] != null}</case>
</switch>

<default to="End"/>
</decision>


<action name="hive-1c24" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-1c24.sql</script>

<param>input=${wf:actionData('hive-04f4')['output']}</param>
</hive2>
<ok to="join"/>
<error to="Kill"/>
</action>


<join name="join" to="loop-decision"/>

<end name="End"/>
</workflow-app>"""

The <decision> node (loop-decision) contains a <switch> element with a single <case> element to check if the output of the first Hive action (hive-04f4) is not null. If it's not null, it proceeds to execute the second Hive action (hive-1c24). If it is null, it goes to the <default> path, which ends the workflow.

Regards,

Chethan YM

VidyaSargur · ‎02-12-2024

@Sokka, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Oozie, how to use the output of an action as arguments for next action