Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Scheduling a Python script in OOZIE

avatar
Contributor

Hi,

I have created a Python script. The script pulls RSS feed and writes the output to a text file. I would like to execute the Python job once a day. Can this be done using OOZIE? Please feel free to suggest a better solution.

Thanks for your help in advance!

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Bidyut B

this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.

View solution in original post

12 REPLIES 12

avatar
Master Guru

yes this can be done in oozie. I would suggest a shell action. You need to upload all files you need ( libraries etc. ) by adding them in file tags. I for example normally have a shell script that does a kinit for kerberos if needed ( you would need to upload the keytab as well) and then executes the python scripts with the parameters like outputFolder.

Now this can run on any datanode so all need access to your RSS feed. However you could also use an SSH action to connect to an edge node.

<action name="mypython">
	<shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
            </configuration>
            <exec>setupAndRun.sh</exec>
            <env-var>outputFolder=${outputFolder}</env-var>
            <env-var>targetFolder=${targetFolder}</env-var>
            <file>${nameNode}/hdfsfolder/setupAndRun.sh#setupAndRun.sh</file>
	    	<file>${nameNode}/hdfsfolder/mypython.py#mypython.py</file>		
        </shell>
        <ok to="end" />
        <error to="kill" />
    </action>

avatar
Master Mentor

@Bidyut B

this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.

avatar
Contributor

@Benjamin Leonhardi and @Artem Ervits

Thank you so much for your detailed responses! I should have mentioned that I am working on the Sandbox and on Windows. I tried to create the job through Ambari->Oozie->Oozie Web UI->Coordinator but there is no feature to create a job. How do I create the job using the UI?

avatar
Contributor

After some searching I found that I need Hue to create Oozie jobs. Ok, now the fun begins, I would like to list the steps for my future reference and for other users.

Hue can be accessed at

http://YourHostName:8000 , thanks to this post https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=22839304 for the hint on the Hue web address. Once on Hue go to the site

http://YourHostNmae:8000/oozie/list_coordinators/ and Create a new coordinator, click on the 'Create' button on the right hand site

The Coordinator will require a workflow, the workflow can be created at

http://YourHostName:8000/oozie/list_workflows/

avatar
Master Mentor

@Bidyut B please create an article preferably with images how to create a coordinator wf with Hue, if you can.

avatar
Contributor

@Artem Ervits

Thanks for the suggestion! I will create an article. Let me know if I need to submit for a review.

avatar
Contributor

Hi @Artem Ervits and @Benjamin Leonhardi,

I am trying to execute a Python script using a Oozie workflow for the last two days with Hue->Workflows->Shell action, but I am getting a message 'Couldn't save the workflow'.

The script 'hello.py' is printing

print("Hello, World!") and placed under /user/oozie folder with all permissions.

My Shell action contains the attached workflow file.

The params are

http://localhost:8050

hdfs://sandbox.hortonworks.com:8020

default

Job properties section, the property name is

'job-tracker' and the value is in the attached file(jobproperties.png)


hue-shell-action.pngjobproperties.png

avatar
Master Mentor

@Bidyut B ith what I gave you, you don't need hue. Just use my directories and adjust. If you get specific errors open a new question.

avatar
Contributor
@Artem Ervits

Sure, I am trying the same example now. I copied the files to the 'Oozie' folder on HDFS , screenshot 'files' ,

1801-files.png

but encountered the error (screenshot attached) related to JVM. To start oozie i navigated to the folder

/usr/lib/oozie/bin/ and executed the script 'oozie-start.sh'

1780-oozie-error.png

I am not sure if this is the correct way to execute the job. Any help would be greatly appreciated.