Created 01-06-2017 07:32 PM
I think i am facing a configuration issue . here is the workflow.xml file I am using in attempt to run/submit the the Oozie job with mapreduce scripts. I am using command : $ oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties
according to :
http://oozie.apache.org/docs/4.1.0/DG_CommandLineTool.html#Oozie_Command_Line_Usage
"the parameters must be in the Java Properties file (.properties). This file must be specified for a map-reduce job. The properties file must specify the mapred.mapper.class , mapred.reducer.class , mapred.input.dir , mapred.output.dir , =oozie.libpath=, mapred.job.tracker , and fs.default.name properties."
The map-reduce job will be created and submitted. All jar files and all other files needed by the mapreduce job need to be uploaded onto HDFS under libpath beforehand. The workflow.xml will be created in Oozie server internally. Users can get the workflow.xml from console or command line(-definition).
However i am getting this error mapreduce-oozie-error.png
I am not sure how to configure my workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python. @Artem Ervits
Created 01-06-2017 09:41 PM
Oozie can't do mapreduce by itself, it's a Hadoop scheduler which launch workflows composed of jobs, which can be mapreduce.
You here want to run a job defined by workflow.xml with parameters in job.properties, so the syntax is
oozie job --oozie http://sandbox.hortonworks.com:11000/oozie -config job.properties -run
Created 01-06-2017 09:41 PM
Oozie can't do mapreduce by itself, it's a Hadoop scheduler which launch workflows composed of jobs, which can be mapreduce.
You here want to run a job defined by workflow.xml with parameters in job.properties, so the syntax is
oozie job --oozie http://sandbox.hortonworks.com:11000/oozie -config job.properties -run
Created 01-06-2017 10:02 PM
Ok , Thank you @Laurent Edel
how to configure my workflow.xml workflow.xml file or the procedure to successfully execute an Oozie job with with mapreduce script written in Python
Created 01-06-2017 10:27 PM
You should consider running hadoop streaming using your python mapper and reducer.
Take a look at https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.2.2.3_Streaming for an example of such that workflow
Try first to execute your streaming directly with something like
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input /user/theuser/input.csv -output /user/theuser/out
Then it'll be easier to schedule that with Oozie, worst case scenario you'll do a shell action with that command
Please accept answer if I answered your question