Support Questions
Find answers, ask questions, and share your expertise

running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

Solved Go to solution

running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

I think i am facing a configuration issue . here is the workflow.xml file I am using in attempt to run/submit the the Oozie job with mapreduce scripts. I am using command : $ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties

according to :

http://oozie.apache.org/docs/4.1.0/DG_CommandLineTool.html#Oozie_Command_Line_Usage

"the parameters must be in the Java Properties file (.properties). This file must be specified for a map-reduce job. The properties file must specify the mapred.mapper.class , mapred.reducer.class , mapred.input.dir , mapred.output.dir , =oozie.libpath=, mapred.job.tracker , and fs.default.name properties."

The map-reduce job will be created and submitted. All jar files and all other files needed by the mapreduce job need to be uploaded onto HDFS under libpath beforehand. The workflow.xml will be created in Oozie server internally. Users can get the workflow.xml from console or command line(-definition).

However i am getting this error mapreduce-oozie-error.png

I am not sure how to configure my workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python. @Artem Ervits

1 ACCEPTED SOLUTION

Accepted Solutions

Re: running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

Guru

@justlearning

Oozie can't do mapreduce by itself, it's a Hadoop scheduler which launch workflows composed of jobs, which can be mapreduce.

You here want to run a job defined by workflow.xml with parameters in job.properties, so the syntax is

oozie job --oozie http://sandbox.hortonworks.com:11000/oozie -config job.properties -run

View solution in original post

3 REPLIES 3

Re: running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

Guru

@justlearning

Oozie can't do mapreduce by itself, it's a Hadoop scheduler which launch workflows composed of jobs, which can be mapreduce.

You here want to run a job defined by workflow.xml with parameters in job.properties, so the syntax is

oozie job --oozie http://sandbox.hortonworks.com:11000/oozie -config job.properties -run

View solution in original post

Re: running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

Ok , Thank you @Laurent Edel

how to configure my workflow.xml workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python

Re: running/scheduling an Oozie job(s) with mapreduce scripts(written in Python)

Guru

You should consider running hadoop streaming using your python mapper and reducer.

Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow

Try first to execute your streaming directly with something like

yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out

Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command

Please accept answer if I answered your question