Posts: 12
Registered: ‎12-15-2015

Unable to remove fork/join tags from Hue/Oozie editor after removing parallel steps.

I'm a bit new to working with Oozie, so please bear with me if I'm missing something basic.


I'm creating an Oozie workflow that executes the terasort.jarcom.github.ehiggs.spark.terasort libraries on a Cloudera cluster.


After a fair bit of struggling, I was able to get it working, but I wasn't satisfied with the process.  


I had originally created three hadoop fs actions to delete the output directories produced by previous TeraGen/TeraSort/TeraValidate executions and then I used the Hue Oozie editor to make them parallel.  After the parallel steps, I added steps to execute the Spark programs. (Documentation on how to get the Spark steps working correctly is a bit incomplete right now online.)  After I got everything working, I then looked at ways to optimize the process.


First, I saw how to perform the file system steps as part of the <prepare> portion of the TeraGen step.  This would allow me to both eliminate the parallel steps and also make the status bar give a more accurate reading of how far along the process had executed.


Once I had added the directory deletion steps in the <prepare> section, I then deleted the parallel actions to perform the deletions.  This is where I ran into problems.  I started getting an error:


E0701: XML schema error, cvc-complex-type.2.4.b: The content of element 'fork' is not complete. One of '{"uri:oozie:workflow:0.5":path}' is expected.


When I look at the Oozie XML that's generated by the Hue editor, I can see that the <fork> and <join> actions weren't deleted at the same time I deleted the parallel steps.  I haven't figured out how to delete them.  I can edit the file, but the next time I go into the graphical editor, it overwrites my edits and re-adds the incomplete steps. (see the snippet from the workflow.xml file and the graphical workflow below).


Is there a good way to fix this?  I've started another separate Oozie workflow, but I'm struggling again with getting the Spark actions to work correctly.


Thanks in advance.


David Webb


<workflow-app name="TeraGen_-_TeraSort_-_TeraValidate_-_1GB-" xmlns="uri:oozie:workflow:0.5">
  <start to="spark-e177"/>
  <kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  <fork name="fork-736f">
  <join name="join-bd18" to="End"/>
  <action name="spark-e177">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <delete path="${nameNode}/user/davidw/teravalidate-benchmark.out"/>
        <delete path="${nameNode}/user/davidw/terasort-benchmark.out"/>
        <delete path="${nameNode}/user/davidw/"/>



TeraGen - TeraSort - TeraValidate - 1GB-copy
Execute TeraGen, TeraSort, and TeraValidate with a 1GB dataset
Spark - Execute TeraGen - 1GB
Spark - Execute TeraGen - 1GB
Spark - Execute TeraGen - 1GB



Posts: 12
Registered: ‎12-15-2015

Re: Unable to remove fork/join tags from Hue/Oozie editor after removing parallel steps.

Just a quick update. I also found that if I set the transition on the first step to "End" and then I delete the first step, the saved workflow begins with <Start to="End">

Our community is getting a little larger. And a lot better.

Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.