About axel_robinson

ledel · ‎01-06-2017

You should consider running hadoop streaming using your python mapper and reducer. Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow Try first to execute your streaming directly with something like yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command Please accept answer if I answered your question

aervits · ‎01-06-2017

It's a good practice to accept answer if it satisfies your needs.

KuldeepK · ‎12-09-2016

@justlearning You can use standard apache oozie examples and modify them as per your requirement. This is easiest way to get started writing Oozie workflows. It has example workflow.xml for each supported action. You can find Oozie examples on HDP cluster at below location(Provided you have installed Oozie client) /usr/hdp/current/oozie-client/doc/oozie-examples.tar.gz Hope this information helps!

matt_andruff · ‎12-09-2016

Oozie is a little old school. have you thought about using HDF or apache Falcon? They both are a little more feature rich. What are you trying to do?

Online	Offline
Last Visited	‎01-24-2017 12:01 PM

Member Since	‎12-08-2016 09:31 PM
Last Visited	‎01-24-2017 12:01 PM
Posts	7

Cloudera Community

Re: running/scheduling an Oozie job(s) with mapred...

Re: How to run Oozie Job with Python Script in San...

Re: process to create the oozie script in hortonwo...

Re: checklist to get started with Oozie on Hadoop?