Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Mentor

it is not completely obvious but you can certainly run Python scripts within Oozie workflows using the Shell action.

Here's a sample job.properties file, nothing special about it.

nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8050
queueName=defaultexamplesRoot=oozie
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/python

Here's a sample workflow that will look for a script called script.py inside scripts folder

<workflow-app xmlns="uri:oozie:workflow:0.4" name="python-wf">
    <start to="python-node"/>
    <action name="python-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>script.py</exec>
	    <file>scripts/script.py</file> 	
            <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
    <message>Python action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

here's my sample script.py

#! /usr/bin/env python
import os, pwd, sys
print "who am I? " + pwd.getpwuid(os.getuid())[0]
print "this is a Python script" 
print "Python Interpreter Version: " + sys.version

directory tree for my workflow assuming the workflow directory is called python is as such

[root@sandbox python]# tree
.
├── job.properties
├── scripts
│   └── script.py
└── workflow.xml


1 directory, 3 files

now you can execute the workflow like any other Oozie workflow.

If you wanted to leverage Python3, make sure Python3 is installed on every node. My Python3 script.py looks like this

#! /usr/bin/env /usr/local/bin/python3.3
import os, pwd, sys
print("who am I? " + pwd.getpwuid(os.getuid())[0])
print("this is a Python script")
print("Python Interpreter Version: " + sys.version)

Everything else above holds true. You can find my sample workflow source code at the following link, including Python3.

28,868 Views
Comments
avatar
Master Guru

@Artem Ervits

There is a typo at below line

#! /usr/bin/env pythonimport os, pwd, sys

It should be like:

#! /usr/bin/env python
import os, pwd, sys
avatar
Master Mentor

thanks for fixing, it's not a typo, it's code formatting in HCC. @Kuldeep Kulkarni

avatar
New Contributor

Hello Artem, I'm in similar need and want to create workflow using python scripts. Tried your example. For some reason it's throwing following error massage on oozie UI:

Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

I've tried to dig down and checked oozie server logs, found nothing but below :

2017-03-16 13:48:50,854  WARN ShellActionExecutor:523 -SERVER[xxx.xx.xxxx.] USER[xxx] GROUP[-] TOKEN[-] APP[hive2-wf] JOB[0002575-170307222346265-oozie-oozi-W] ACTION[0002575-170307222346265-oozie-oozi-W@shell-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Further I checked yarn logs under LogType:launch_container.sh and found below:

LogType:stderr 
Log Upload Time:Thu Mar 16 15:00:46 -0400 2017 
LogLength:382 
Log Contents:
./script.py: line 1: print: command not found 
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

My script.py has only one line

print "hello"

I'm able to run python file on node but same script not working when submitted thru oozie.

Overalll, what I have understood is - python libraries are not getting available when submitting thru oozie server.

Can you please suggest.

avatar
Master Mentor

@Sam Pat first of all thanks for checking out my article, I see you have company reference in your error message, please edit your comment and remove it. Secondly, can you run your python script w/out Oozie? I have a feeling you're trying to execute a Python 2 script with Python3 as default interpreter. You should add the interpreter line to your script and try again. Take a look at my scripts I have a version for Python 2

#! /usr/bin/env python

and Python 3

#! /usr/bin/env /usr/local/bin/python3.3

If your cluster has Python 3 installed, make sure it's across the whole cluster and has the same path. If it's Python2 then also make sure every node is configured correctly with the location of the interpreter.

avatar
New Contributor

Hello Artem, thanks, adding an interpreter line worked. I don't know how could I forget that...? I think, i'm doing lot of multi tasking. Also I don't have python 3 installed so I was running on python 2. Once again, thank you for quick response. Really appreciate it. Sam