Support Questions

Find answers, ask questions, and share your expertise

Oozie, how to use the output of an action as arguments for next action

avatar
Explorer

Hello,

I'm new to using oozie and I've been trying to figure out how to use the output of an action as the input of the next action in Oozie. I'm actually coming from using traditional ETLs, where the output of one Step can be used as arguments in the next Step. My question is whether the following points are possible with Oozie:

  1. Use the output of one action as the input of the next action
  2. Let a subsequent action be executed "n" times according to the output of the first action. Like sub jobs or something.

Actually, in order to know if this is possible, we are trying to run a very basic dump example where we read and insert with two hive actions, one for read a table and another one to write. Something like the following example created with Hue Oozie Editor.


<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
  <start to="hive-04f4"/>
  <kill name="Kill">
  <message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="hive-04f4" cred="hive2">
    <hive2 xmlns="uri:oozie:hive2-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
      <script>${wf:appPath()}/hive-04f4.sql</script>
    </hive2>
    <ok to="hive-1c24"/>
    <error to="Kill"/>
  </action>
  <action name="hive-1c24" cred="hive2">
    <hive2 xmlns="uri:oozie:hive2-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
      <script>${wf:appPath()}/hive-1c24.sql</script>
      <param>fecha=${fecha_insercion}</param>
      <param>registro=${now}</param>
    </hive2>
    <ok to="End"/>
    <error to="Kill"/>
  </action>
<end name="End"/>
</workflow-app>

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi @Sokka , I think its possible, can you try the below?

"""<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="hive-04f4"/>
<kill name="Kill">
<message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>

<!-- First Hive action to read data -->
<action name="hive-04f4" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-04f4.sql</script>
<!-- Set output property to be used in next action -->
<capture-output/>
</hive2>
<ok to="loop-decision"/>
<error to="Kill"/>
</action>

<!-- Decision node to determine whether to execute next action -->
<decision name="loop-decision">
<switch>
<!-- If output is not null, execute next action -->
<case to="hive-1c24">${wf:actionData('hive-04f4')['output'] != null}</case>
</switch>
<!-- If output is null, end the workflow -->
<default to="End"/>
</decision>

<!-- Second Hive action to write data -->
<action name="hive-1c24" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-1c24.sql</script>
<!-- Use output from previous action as input parameter -->
<param>input=${wf:actionData('hive-04f4')['output']}</param>
</hive2>
<ok to="join"/>
<error to="Kill"/>
</action>

<!-- Join node to synchronize paths after the second action -->
<join name="join" to="loop-decision"/>

<end name="End"/>
</workflow-app>"""

The <decision> node (loop-decision) contains a <switch> element with a single <case> element to check if the output of the first Hive action (hive-04f4) is not null. If it's not null, it proceeds to execute the second Hive action (hive-1c24). If it is null, it goes to the <default> path, which ends the workflow.

 

Regards,

Chethan YM

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

Hi @Sokka , I think its possible, can you try the below?

"""<workflow-app name="Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="hive-04f4"/>
<kill name="Kill">
<message>Error al realizar la acción. Mensaje de error [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>

<!-- First Hive action to read data -->
<action name="hive-04f4" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-04f4.sql</script>
<!-- Set output property to be used in next action -->
<capture-output/>
</hive2>
<ok to="loop-decision"/>
<error to="Kill"/>
</action>

<!-- Decision node to determine whether to execute next action -->
<decision name="loop-decision">
<switch>
<!-- If output is not null, execute next action -->
<case to="hive-1c24">${wf:actionData('hive-04f4')['output'] != null}</case>
</switch>
<!-- If output is null, end the workflow -->
<default to="End"/>
</decision>

<!-- Second Hive action to write data -->
<action name="hive-1c24" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://host:10000/default</jdbc-url>
<script>${wf:appPath()}/hive-1c24.sql</script>
<!-- Use output from previous action as input parameter -->
<param>input=${wf:actionData('hive-04f4')['output']}</param>
</hive2>
<ok to="join"/>
<error to="Kill"/>
</action>

<!-- Join node to synchronize paths after the second action -->
<join name="join" to="loop-decision"/>

<end name="End"/>
</workflow-app>"""

The <decision> node (loop-decision) contains a <switch> element with a single <case> element to check if the output of the first Hive action (hive-04f4) is not null. If it's not null, it proceeds to execute the second Hive action (hive-1c24). If it is null, it goes to the <default> path, which ends the workflow.

 

Regards,

Chethan YM

avatar
Community Manager

@Sokka, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: