Created on 02-11-2017 07:53 PM - edited 08-17-2019 04:54 AM
This is a second in the series of articles on WFM.
In this tutorial, we're going to import an existing workflow with a Python script wrapped in a shell action. The existing workflows can exist on HDFS or in your local file system. Let's fetch it onto the canvas.
My workflow is already on HDFS and therefore that's the option I select. WFM view is integrated with WEBHDFS browser and it makes navigating the directory tree very easy.
Navigate to the directory in HDFS with the desired workflow and hit select. Once imported, WFM will run validation on the syntax and present it for further modification.
Now you can modify the python-node by hovering over it and clicking the gear icon.
Once clicked, you can configure the rest of the action to your liking.
it inherits all of the old properties of your workflow.
Notice you can specify a directory and script file in the File text box.
My Oozie workflow also has old properties like specification of YARN Queue, WFM correctly parses and inherits that property.
Also notice I have capture output as I'd like to see the result of the output to the console by my script.
At this point, I'm ready to preview my workflow, WFM comes with a handy XML preview.
Looks all right to me, I'm ready to submit. Notice WFM doesn't know what $jobTracker is and prompts me to fill that out along with queue.
At this point we can navigate to the WFM Dashboard tab as you've seen in my previous tutorial and track the job status.
My job failed, I can debug the job status directly from WFM
Turns out, issue is with my parameter $jobTracker, in WFM, it was renamed to $resourceManager and it comes by default, I need to remove my custom parameter and let WFM do what it does best. Here's preview of my XML after the change
Back in the dashboard, I can click on the job and investigate the status.
My job completed successfully, I can navigate to the YARN job status straight from WFM.
I need to click on my succeeded wf and click on the arrow icon. It is right there on the right, same row asthe python-node
Finally, navigate to the logs of your YARN job to view the output
And that's all for this tutorial, you learned how to import an existing Python Oozie workflow and further edit it via WFM. My Python script by the way has the following code
#! /usr/bin/env python import os, pwd, sysprint "who am I? " + pwd.getpwuid(os.getuid()) print "this is a Python script" print "Python Interpreter Version: " + sys.version
You can find my workflow along with other samples on my github page https://github.com/dbist/oozie