Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

oozie loop: configuration parameters not updated

oozie loop: configuration parameters not updated

Contributor

Dear all, based on this article I'm trying to create a loop in Oozie.

I've simplified the code a bit, I only need a list, not a range, and I removed the first workflow and start directly with the loop.xml (renamed to workflow.xml). I also changed the workflow scheme to the latest version to 0.5.

When I start the workflow the loop_value and loop_list configuration parameters are passed. The loop_value is a new parameter and passed correctly, the first string in a comma separated parameter (defined in the job.properties file). The loop_list on the other hand is not updated with the new value (all but the first string) resulting in an endless loop. I've duplicated the loop_list parameter and called it loop_list2 and then the (new) parameter gets passed successfully to the sub-workflow.

Here the xmls (I removed the default comments here, they are still present in my original xmls):

master workflow:

<workflow-app name="loop_touchz" xmlns="uri:oozie:workflow:0.5">
  <start to="loop_list"/>

  <action name="loop_list">
    <sub-workflow>
      <app-path>${appPath}/loop_list_step.xml</app-path>
      <propagate-configuration/>
      <configuration>
        <property>
          <name>loop_value</name>
          <value>${replaceAll(replaceAll(loop_list, '([^,]*),?(.*)', '$1'), '^$', '--NOVALUE--')}</value>
        </property>
        <property>
          <name>loop_list</name>
          <value>${replaceAll(replaceAll(loop_list, '([^,]*),?(.*)', '$2'), '^$', '--ENDOFLIST--')}</value>
        </property>
        <property>
          <name>loop_list2</name>
          <value>${replaceAll(replaceAll(loop_list, '([^,]*),?(.*)', '$2'), '^$', '--ENDOFLIST--')}</value>
        </property>
      </configuration>
    </sub-workflow>
    <ok to="end"/>
    <error to="error"/>
  </action>

  <kill name="error">
    <message>Oops!</message>
  </kill>

  <end name="end"/>
</workflow-app>

sub-workflow loop_list_step:

<workflow-app name="loop_touchz_(${loop_value})" xmlns="uri:oozie:workflow:0.5">
  <start to="end"/>
     
  <fork name="fork">
    <path start="run_parallel"/>
    <path start="check_continue_parallel"/>
  </fork>

  <action name="run_parallel">
    <sub-workflow>
      <app-path>${appPath}/touchz.xml</app-path>
      <propagate-configuration/>
    </sub-workflow>
    <ok to="join"/>
    <error to="error"/>
  </action>

  <decision name="check_continue_parallel">
    <switch>
      <case to="continue_parallel">${loop_list ne "--ENDOFLIST--"}</case>
      <default to="join"/>
    </switch>
  </decision>

  <action name="continue_parallel">
    <sub-workflow>
      <app-path>${wf:appPath()}</app-path>
      <propagate-configuration/>
      <configuration>
        <property>
          <name>loop_value</name>
          <value>${replaceAll(replaceAll(loop_list, '([^,]*),?(.*)', '$1'), '^$', '--NOVALUE--')}</value>
        </property>
        <property>
          <name>loop_value2</name>
          <value>${replaceAll(replaceAll(loop_list2, '([^,]*),?(.*)', '$1'), '^$', '--NOVALUE--')}</value>
        </property>
        <property>
          <name>loop_list</name>
          <value>${replaceAll(replaceAll(loop_list, '([^,]*),?(.*)', '$2'), '^$', '--ENDOFLIST--')}</value>
        </property>
        <property>
          <name>loop_list2</name>
          <value>${replaceAll(replaceAll(loop_list2, '([^,]*),?(.*)', '$2'), '^$', '--ENDOFLIST--')}</value>
        </property>
        <property>
          <name>loop_list3</name>
          <value>${replaceAll(replaceAll(loop_list2, '([^,]*),?(.*)', '$2'), '^$', '--ENDOFLIST--')}</value>
        </property>
      </configuration>
    </sub-workflow>
    <ok to="join"/>
    <error to="error"/>
  </action>

  <join name="join" to="end"/>

  <kill name="error">
    <message>Oops!</message>
  </kill>

  <end name="end"/>
</workflow-app>

I guide the sub-workflow directly to 'end' to prevent the endless loop (so the touchz.xml is not used in this example) and when checking the execution (in the Oozie Workflows Dashboard) I see the following parameters in the configuration tab:

loop_listquick,brown,fox,jumps,over,the,lazy,dog
loop_list2brown,fox,jumps,over,the,lazy,dog
loop_valuequick

It looks like I'm not allowed to overwrite an already existing parameter. Is this a setting in oozie that I need to change or....? I though the configuration block of a sub-workflow action was supposed to override previous configurations?

Any thoughts? I can’t find anything wrong in the workflow so start to think there is a bug in oozie...

Ohh btw, our platform is running HDP 2.5

8 REPLIES 8

Re: oozie loop: configuration parameters not updated

Contributor

I just tested the same script in the Hortonworks Docker Sandbox (2.5) and also there I encounter the same problem. An already existing variable cannot be overwritten/overruled when passing a new value for that variable to a new workflow (even though it should). I'm going to test it one more time with the default scripts as provided by @jyoung...

Re: oozie loop: configuration parameters not updated

Contributor

ok, even the original script (from Jeremy Beard) doesn't work with the sandbox. I get an endless loop in oozie (that is auto killed after 50 subworkflows). It seems to be a problem with oozie that doesn't allow updating existing parameters any more when starting a sub-workflow action. Documentation states that parameters defined in an action should override any global/job.properties defined parameters.. Bug?

Re: oozie loop: configuration parameters not updated

New Contributor

Hello, can I ask what is the end goal you're trying to accomplish? What kind of task are you trying to perform and what is the result you want to achieve? Maybe we can figure out a work-around, but I'm not able to follow all of the modifications without knowing the desired result.

Re: oozie loop: configuration parameters not updated

Contributor

End goal is to iterate over a list that will be created by a shell script (different every time).

I don't think you don't need to understand the modifications, I even tried it with the default script from Jason Beard and also that workflow currently doesn't work on our HDP2.5 cluster, nor in the latest sandbox. Parameters seem to be persistent and no override by configuration section in the workflow or action is allowed.

Highlighted

Re: oozie loop: configuration parameters not updated

New Contributor

I've created a new Oozie workflow application which again uses Jeremy Beard's oozieloop project. However this demo does not use a hard-coded loop_list in the job.properties file. Instead I use a shell script and Oozie shell action to read a randomly selected row from a CSV file. That comma separated row of data dynamically populates the loop_list variable. The loop_list is iterated over by the oozieloop project's loop.xml file. For each element in the loop_list, a separate sub-workflow is called and a new file with that element's value as the filename is created in HDFS. In my demo, I do not modify Jeremy Beard's oozieloop xml files other than replacing occurrences of "/your/path/to/" with "${wfDir}".

The wiki for the demo is here: https://github.com/jlyoung/advancedoozieworkflows/wiki/Oozieloop-with-dynamically-populated-loop_lis...

All of the code can be found here: https://github.com/jlyoung/advancedoozieworkflows/tree/master/dynamiclistloop

Re: oozie loop: configuration parameters not updated

Contributor

aparantly this was a bug in oozie which has been fixed last year. Unfortunately it looks like Hortonworks hasn't applied this patch in it's latest distribution though:

https://issues.apache.org/jira/browse/OOZIE-2649

Re: oozie loop: configuration parameters not updated

Rising Star

Rene Sluiter

Thanks for reporting this issue

When HDP 2.5 was released, the features that were applicable from 4.3 release (which was in progress were made) but this one was not available at the time. 4.3.0 was released only a few weeks ago. Can you create a bug request and we can get this into an upcoming maintenance release

Re: oozie loop: configuration parameters not updated

Contributor

Thanks for clarifying @Venkat Ranganathan. I don't have a personal support account so I'm not sure if I can create this bug request you mention. I did inform our direct Hortonworks contact at the client, but if I can create the bug request I'm happy to do so.