Member since
12-08-2016
7
Posts
0
Kudos Received
0
Solutions
01-06-2017
10:27 PM
You should consider running hadoop streaming using your python mapper and reducer. Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow Try first to execute your streaming directly with something like yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out
Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command Please accept answer if I answered your question
... View more
01-06-2017
10:37 PM
It's a good practice to accept answer if it satisfies your needs.
... View more
12-09-2016
09:32 PM
5 Kudos
@justlearning
You can use standard apache oozie examples and modify them as per your requirement. This is easiest way to get started writing Oozie workflows. It has example workflow.xml for each supported action. You can find Oozie examples on HDP cluster at below location(Provided you have installed Oozie client) /usr/hdp/current/oozie-client/doc/oozie-examples.tar.gz Hope this information helps!
... View more
12-09-2016
08:41 PM
Oozie is a little old school. have you thought about using HDF or apache Falcon? They both are a little more feature rich. What are you trying to do?
... View more