Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Scheduling a Python script in OOZIE

Solved Go to solution

Scheduling a Python script in OOZIE

New Contributor

Hi,

I have created a Python script. The script pulls RSS feed and writes the output to a text file. I would like to execute the Python job once a day. Can this be done using OOZIE? Please feel free to suggest a better solution.

Thanks for your help in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Scheduling a Python script in OOZIE

Mentor

@Bidyut B

this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.

12 REPLIES 12

Re: Scheduling a Python script in OOZIE

yes this can be done in oozie. I would suggest a shell action. You need to upload all files you need ( libraries etc. ) by adding them in file tags. I for example normally have a shell script that does a kinit for kerberos if needed ( you would need to upload the keytab as well) and then executes the python scripts with the parameters like outputFolder.

Now this can run on any datanode so all need access to your RSS feed. However you could also use an SSH action to connect to an edge node.

<action name="mypython">
	<shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
            </configuration>
            <exec>setupAndRun.sh</exec>
            <env-var>outputFolder=${outputFolder}</env-var>
            <env-var>targetFolder=${targetFolder}</env-var>
            <file>${nameNode}/hdfsfolder/setupAndRun.sh#setupAndRun.sh</file>
	    	<file>${nameNode}/hdfsfolder/mypython.py#mypython.py</file>		
        </shell>
        <ok to="end" />
        <error to="kill" />
    </action>

Re: Scheduling a Python script in OOZIE

Mentor

@Bidyut B

this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.

Re: Scheduling a Python script in OOZIE

New Contributor

@Benjamin Leonhardi and @Artem Ervits

Thank you so much for your detailed responses! I should have mentioned that I am working on the Sandbox and on Windows. I tried to create the job through Ambari->Oozie->Oozie Web UI->Coordinator but there is no feature to create a job. How do I create the job using the UI?

Re: Scheduling a Python script in OOZIE

New Contributor

After some searching I found that I need Hue to create Oozie jobs. Ok, now the fun begins, I would like to list the steps for my future reference and for other users.

Hue can be accessed at

http://YourHostName:8000 , thanks to this post https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=22839304 for the hint on the Hue web address. Once on Hue go to the site

http://YourHostNmae:8000/oozie/list_coordinators/ and Create a new coordinator, click on the 'Create' button on the right hand site

The Coordinator will require a workflow, the workflow can be created at

http://YourHostName:8000/oozie/list_workflows/

Re: Scheduling a Python script in OOZIE

Mentor

@Bidyut B please create an article preferably with images how to create a coordinator wf with Hue, if you can.

Re: Scheduling a Python script in OOZIE

New Contributor

@Artem Ervits

Thanks for the suggestion! I will create an article. Let me know if I need to submit for a review.

Re: Scheduling a Python script in OOZIE

New Contributor

Hi @Artem Ervits and @Benjamin Leonhardi,

I am trying to execute a Python script using a Oozie workflow for the last two days with Hue->Workflows->Shell action, but I am getting a message 'Couldn't save the workflow'.

The script 'hello.py' is printing

print("Hello, World!") and placed under /user/oozie folder with all permissions.

My Shell action contains the attached workflow file.

The params are

http://localhost:8050

hdfs://sandbox.hortonworks.com:8020

default

Job properties section, the property name is

'job-tracker' and the value is in the attached file(jobproperties.png)


hue-shell-action.pngjobproperties.png

Re: Scheduling a Python script in OOZIE

Mentor

@Bidyut B ith what I gave you, you don't need hue. Just use my directories and adjust. If you get specific errors open a new question.

Re: Scheduling a Python script in OOZIE

New Contributor
@Artem Ervits

Sure, I am trying the same example now. I copied the files to the 'Oozie' folder on HDFS , screenshot 'files' ,

1801-files.png

but encountered the error (screenshot attached) related to JVM. To start oozie i navigated to the folder

/usr/lib/oozie/bin/ and executed the script 'oozie-start.sh'

1780-oozie-error.png

I am not sure if this is the correct way to execute the job. Any help would be greatly appreciated.

Don't have an account?
Coming from Hortonworks? Activate your account here