Created on 02-15-2017 07:54 PM - edited 08-17-2019 02:13 PM
In this tutorial, we're going to leverage Oozie's SLA monitoring features via Workflow Manager. To read more about SLA features in Oozie, please look at the official documentation https://oozie.apache.org/docs/4.2.0/DG_SLAMonitoring.html
We will begin with a simple shell action that will sleep for duration of time. Create a new file called script.sh and paste add the code below.
echo “start of script execution” sleep 60 echo “end of script execution”
We're also going to create a workflow HDFS directory and upload this script to it.
hdfs dfs –mkdir oozie/shell-sla hdfs dfs -put script.sh oozie/shell-sla/
Let's begin with adding a shell action and populating the script name and file attribute.
Don't forget to check the capture output box. We want to see the output of this action.
I want to submit the workflow to make sure everything works as expected before configuring SLA features.
It's a good idea to preview the XML to make sure file tag and exec tags are filled correctly.
Once job completes, I want to drill down to the job and view the output.
Everything looks good, we're ready to enable SLA features of Oozie via Workflow Manager. Click on the shell action and then gear icon. At the bottom of the configuration page, you will see SLA section. Expand that and check the enabled box.
Each field is described as below:
I'm going to simulate each one of the SLA patterns, i.e. my job started later than scheduled, my job completed outside the SLA threshold and finally, my job took longer to complete than we were expecting. To fill out the nominal time, feel free to choose the date and clock icon below the date picker for correct time. Click x when ready.
Finally, I'd like to change my script to run for 120 seconds instead of 60 to simulate long duration.
My script should look like so:
echo “start of script execution” sleep 120 echo “end of script execution”
When ready re-upload the script.
At this point, I want to make sure sending mail from the cluster is possible and will test that by sending a sample email. Enabling mail is beyond the scope of this tutorial, I followed the procedure below, adjust as necessary for your environment.
sudo su yum install postfix /etc/init.d/postfix restart exit
Now we're able to send mail from our node, mail needs to work on any of the nodes Oozie will execute a wf.
mail -s "test" email@example.com
hit ctrl-D, you should get an email shortly.
Finally, there are some changes we need to implement on the Oozie side. I'm not going to enable JMS alerting and only concentrate on email piece. Please consult Oozie docs for JMS part.
This is HDP 2.5.3 and things may look/act differently on your Oozie instance.
Let's go to Ambari > Configs
filter by the following property
We're going to add these services to the existing list:
Once ready, add a couple of more custom properties in Oozie, again, in my environment these properties did not exist.
and the value should be
Oozie docs also recommend adding the following property to improve performance of event processing, we're going to add this property and set value to 15.
Once I saved the changes and restarted Oozie, it failed to start, looking at the logs I noticed the following in the oozie-error.log:
2017-02-15 18:14:32,757 WARN ConfigUtils:523 - SERVER[wfmanager-test-1.openstacklocal] Using a deprecated configuration property [oozie.service.AuthorizationService.security.enabled], should use [oozie.service.AuthorizationService.authorization.enabled]. Please delete the deprecated property in order for the new property to take effect.
I found the property in Ambari > Configs and set it to false, I was not able to delete it.
Once done, restart all Oozie services and now you're able to see a new tab in Oozie called SLA
Remember, we only configured Email service, not JMS. We're ready to test our wf. Before that, I'd like to preview the XML for good measure.
At this point, I'm ready to submit the workflow and watch my inbox. I'm expecting to miss my job start, job end and duration. This is an email output of my workflow.
Until next time folks!