Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

avatar

I think i am facing a configuration issue . here is the workflow.xml file I am using in attempt to run/submit the the Oozie job with mapreduce scripts. I am using command : $ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties

according to :

http://oozie.apache.org/docs/4.1.0/DG_CommandLineTool.html#Oozie_Command_Line_Usage

"the parameters must be in the Java Properties file (.properties). This file must be specified for a map-reduce job. The properties file must specify the mapred.mapper.class , mapred.reducer.class , mapred.input.dir , mapred.output.dir , =oozie.libpath=, mapred.job.tracker , and fs.default.name properties."

The map-reduce job will be created and submitted. All jar files and all other files needed by the mapreduce job need to be uploaded onto HDFS under libpath beforehand. The workflow.xml will be created in Oozie server internally. Users can get the workflow.xml from console or command line(-definition).

However i am getting this error mapreduce-oozie-error.png

I am not sure how to configure my workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python. @Artem Ervits

1 ACCEPTED SOLUTION

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Ok , Thank you @Laurent Edel

how to configure my workflow.xml workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python

avatar
Guru

You should consider running hadoop streaming using your python mapper and reducer.

Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow

Try first to execute your streaming directly with something like

yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out

Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command

Please accept answer if I answered your question