Created on 02-01-2016 06:53 PM
it is not completely obvious but you can certainly run Python scripts within Oozie workflows using the Shell action.
Here's a sample job.properties file, nothing special about it.
nameNode=hdfs://sandbox.hortonworks.com:8020 jobTracker=sandbox.hortonworks.com:8050 queueName=defaultexamplesRoot=oozie oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/python
Here's a sample workflow that will look for a script called script.py inside scripts folder
<workflow-app xmlns="uri:oozie:workflow:0.4" name="python-wf"> <start to="python-node"/> <action name="python-node"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>script.py</exec> <file>scripts/script.py</file> <capture-output/> </shell> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Python action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
here's my sample script.py
#! /usr/bin/env python import os, pwd, sys print "who am I? " + pwd.getpwuid(os.getuid())[0] print "this is a Python script" print "Python Interpreter Version: " + sys.version
directory tree for my workflow assuming the workflow directory is called python is as such
[root@sandbox python]# tree . ├── job.properties ├── scripts │ └── script.py └── workflow.xml 1 directory, 3 files
now you can execute the workflow like any other Oozie workflow.
If you wanted to leverage Python3, make sure Python3 is installed on every node. My Python3 script.py looks like this
#! /usr/bin/env /usr/local/bin/python3.3 import os, pwd, sys print("who am I? " + pwd.getpwuid(os.getuid())[0]) print("this is a Python script") print("Python Interpreter Version: " + sys.version)
Everything else above holds true. You can find my sample workflow source code at the following link, including Python3.
Created on 07-29-2016 05:37 PM
There is a typo at below line
#! /usr/bin/env pythonimport os, pwd, sys
It should be like:
#! /usr/bin/env python import os, pwd, sys
Created on 07-29-2016 06:33 PM
thanks for fixing, it's not a typo, it's code formatting in HCC. @Kuldeep Kulkarni
Created on 03-16-2017 07:30 PM
Hello Artem, I'm in similar need and want to create workflow using python scripts. Tried your example. For some reason it's throwing following error massage on oozie UI:
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I've tried to dig down and checked oozie server logs, found nothing but below :
2017-03-16 13:48:50,854 WARN ShellActionExecutor:523 -SERVER[xxx.xx.xxxx.] USER[xxx] GROUP[-] TOKEN[-] APP[hive2-wf] JOB[0002575-170307222346265-oozie-oozi-W] ACTION[0002575-170307222346265-oozie-oozi-W@shell-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Further I checked yarn logs under LogType:launch_container.sh and found below:
LogType:stderr Log Upload Time:Thu Mar 16 15:00:46 -0400 2017 LogLength:382 Log Contents: ./script.py: line 1: print: command not found Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
My script.py has only one line
print "hello"
I'm able to run python file on node but same script not working when submitted thru oozie.
Overalll, what I have understood is - python libraries are not getting available when submitting thru oozie server.
Can you please suggest.
Created on 03-16-2017 07:54 PM
@Sam Pat first of all thanks for checking out my article, I see you have company reference in your error message, please edit your comment and remove it. Secondly, can you run your python script w/out Oozie? I have a feeling you're trying to execute a Python 2 script with Python3 as default interpreter. You should add the interpreter line to your script and try again. Take a look at my scripts I have a version for Python 2
#! /usr/bin/env python
and Python 3
#! /usr/bin/env /usr/local/bin/python3.3
If your cluster has Python 3 installed, make sure it's across the whole cluster and has the same path. If it's Python2 then also make sure every node is configured correctly with the location of the interpreter.
Created on 03-17-2017 12:50 AM
Hello Artem, thanks, adding an interpreter line worked. I don't know how could I forget that...? I think, i'm doing lot of multi tasking. Also I don't have python 3 installed so I was running on python 2. Once again, thank you for quick response. Really appreciate it. Sam