Created 01-29-2016 08:26 AM
Hi,
I have created a Python script. The script pulls RSS feed and writes the output to a text file. I would like to execute the Python job once a day. Can this be done using OOZIE? Please feel free to suggest a better solution.
Thanks for your help in advance!
Created 01-31-2016 05:30 PM
this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.
Created 01-29-2016 09:43 AM
yes this can be done in oozie. I would suggest a shell action. You need to upload all files you need ( libraries etc. ) by adding them in file tags. I for example normally have a shell script that does a kinit for kerberos if needed ( you would need to upload the keytab as well) and then executes the python scripts with the parameters like outputFolder.
Now this can run on any datanode so all need access to your RSS feed. However you could also use an SSH action to connect to an edge node.
<action name="mypython"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>setupAndRun.sh</exec> <env-var>outputFolder=${outputFolder}</env-var> <env-var>targetFolder=${targetFolder}</env-var> <file>${nameNode}/hdfsfolder/setupAndRun.sh#setupAndRun.sh</file> <file>${nameNode}/hdfsfolder/mypython.py#mypython.py</file> </shell> <ok to="end" /> <error to="kill" /> </action>
Created 01-31-2016 05:30 PM
this is a common request as I also was interested how to do it. As @Benjamin Leonhardi stated you use the standard shell action. Here's a sample Python workflow I created and tested on current release of Sandbox. If you want to use Python3 with Oozie, I added an example of that too, though it shouldn't be much different.
Created 02-01-2016 05:44 AM
@Benjamin Leonhardi and @Artem Ervits
Thank you so much for your detailed responses! I should have mentioned that I am working on the Sandbox and on Windows. I tried to create the job through Ambari->Oozie->Oozie Web UI->Coordinator but there is no feature to create a job. How do I create the job using the UI?
Created 02-01-2016 06:22 AM
After some searching I found that I need Hue to create Oozie jobs. Ok, now the fun begins, I would like to list the steps for my future reference and for other users.
Hue can be accessed at
http://YourHostName:8000 , thanks to this post https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=22839304 for the hint on the Hue web address. Once on Hue go to the site
http://YourHostNmae:8000/oozie/list_coordinators/ and Create a new coordinator, click on the 'Create' button on the right hand site
The Coordinator will require a workflow, the workflow can be created at
Created 02-01-2016 11:00 AM
@Bidyut B please create an article preferably with images how to create a coordinator wf with Hue, if you can.
Created 02-02-2016 07:03 AM
Created 02-03-2016 11:08 AM
Hi @Artem Ervits and @Benjamin Leonhardi,
I am trying to execute a Python script using a Oozie workflow for the last two days with Hue->Workflows->Shell action, but I am getting a message 'Couldn't save the workflow'.
The script 'hello.py' is printing
print("Hello, World!") and placed under /user/oozie folder with all permissions.
My Shell action contains the attached workflow file.
The params are
hdfs://sandbox.hortonworks.com:8020
default
Job properties section, the property name is
'job-tracker' and the value is in the attached file(jobproperties.png)
Created 02-03-2016 11:45 AM
@Bidyut B ith what I gave you, you don't need hue. Just use my directories and adjust. If you get specific errors open a new question.
Created on 02-04-2016 07:39 AM - edited 08-19-2019 03:48 AM
Sure, I am trying the same example now. I copied the files to the 'Oozie' folder on HDFS , screenshot 'files' ,
but encountered the error (screenshot attached) related to JVM. To start oozie i navigated to the folder
/usr/lib/oozie/bin/ and executed the script 'oozie-start.sh'
I am not sure if this is the correct way to execute the job. Any help would be greatly appreciated.