Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Master Mentor

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design....

Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo....

Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-...

Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...

Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz....

Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-...

Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-...

Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-...

Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-...

Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-...

Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz...

In the last tutorial I created a coordinator called part-10-coord. I'm going to use it in this tutorial to create a bundle.

I'm personally new to bundles and only discovered them reviewing WFM. You can learn more about bundles here https://oozie.apache.org/docs/4.2.0/BundleFunctionalSpec.html

Bundles are designed to make working with coordinators easier and managing coordinators on more holistic level.

Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control.

More specifically, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline.

Let's go to the top right hand corner and click on create, this time selecting bundle as choice.

12946-30-create-bundle.png

You're now prompted to enter coordinator information. Click on Add Coordinator button and fill out with existing coordinator information giving the full path of the coordinator XML file.

12947-31-fill-out-bundle.png

If you provide a full path to the coordinator XML, coordinator name will be populated on its own.

12948-32-fill-out-bundle.png

If your data pipeline consists of many coordinators, you can chain them here by adding more coordinators and their paths.

Since my pipeline consists of only one coordinator, (yes not really useful, though I can see how it can be useful when you have multiple), I'm going to click on green Add button to finish.

12949-33-add-coord-to-bundle.png

Last thing left to do is enter kick off time. It expects a date, if none given, it will default to NOW, which means it will kick off immediately once submitted.

Bundle Application Definition

A bundle definition is defined in XML by a name, controls and one or more coordinator application specifications:

  • name: The name for the bundle job.

* controls: The control specification for the bundle.

    • kick-off-time: It defines when the bundle job should start and submit the coordinator applications. This field is optional and the default is NOW that means the job should start right-a-way.
  • coordinator: Coordinator application specification. There should be at least one coordinator application in any bundle.
    • name: Name of the coordinator application. It can be used for referring this application through bundle to control such as kill, suspend, rerun.
    • app-path: Path of the coordinator application definition in hdfs. This is a mandatory element.
    • configuration: A hadoop like configuration to parameterize corresponding coordinator application. This is optional.

Finally, I'm going to rename the bundle workflow to part-10-bundle and submit it, notice I saved it to /user/centos/part-10 along with existing workflow called part-10 and coordinator called part-10-coord. All three XML files will be in the part-10 directory for organization purposes, though not required.

12950-34-submit-bundle.png

Same as with workflows and coordinators, I can see my bundles run on the Dashboard.

12951-36-see-bundle-run.png

The configuration options change a bit and I no longer see an action tab, I see a coordinator tab. It also shows all my running coordinators that belong to the bundle.

12952-37-see-coord-as-part-of-bundle-run.png

The definition tab shows all required properties for bundle to run. WFM makes it easy to fill out the properties and you're no longer required to maintain a job.properties file.

12953-38-bundle-config.png

Last thing I want to do is show you XML generated for this bundle.

12954-39-preview-bundle-xml.png

This tutorial just goes to show you how easy it is to start learning Oozie nomenclature, before my experience with WFM, I did not know how to work with bundles, decision nodes, SLA features, etc. WFM makes working with Oozie more approachable. Until next time!

3,652 Views
Comments
avatar
Expert Contributor

Thanks @Artem Ervits for the series of articles. Oozie has a bunch of features that are not tapped because it is not easily approachable. When we started off with Workflow designer we wanted to make an easy to use editor for users with some knowledge of Oozie and Hadoop to create workflows and explore further. Also, one of the common issues is the dashboard - Oozie UI using old UI we wanted to provide a new UI experience for the dashboard even if workflows are generated outside of workflow manager.

@Venkat Ranganathan

from my experience, I think the goal was achieved. I love this product, planning to write more parts once blockers are addressed.