- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Configure an oozie mapreduce action
Created on ‎02-04-2014 01:41 PM - edited ‎09-16-2022 01:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to use hue to configure a oozie workflow which consists a mapreduce job. I have a hard time to figure out where the arguments go.
For example, if I am to run the famous wordcount jar with a twist that is having to use a date variable which I will define in an coordinator.
$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input/${date} /usr/joe/wordcount/output/${date}
From Hue-Oozie, it is only obvious to me where the jar file is defined, but how about:
- classname
- input
- output
How do I specify these pieces? I wish there is a oozie workflow video showing how to define a mapreduce action.
Created ‎02-04-2014 01:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here is an example if you are using a Driver class:
Steps:
- Pull the Source code for the new PiEstimator and compile with Maven. Requires Git, Maven and Java:
- git clone https://github.com/cmconner156/oozie_pi_load_test.git
- cd oozie_pi_load_test/PiEstimatorKrbSrc
- vi pom.xml
- set hadoop-core and hadoop-client to match your version.
- mvm clean install
- Copy oozie_pi_load_test/PiEstimatorKrbSrc/target/PiEstimatorKrb-1.0.jar to some location in HDFS. Make sure it's readable by whichever Hue user will run the workflow.
- Go to Hue browser and go to the Oozie app
- Go to the Workflows tab
- Click "Create"
- Enter a name and description
- Click Save
- Drag "Java" from the actions above to the slot between "Start" and "end"
- Give it a name and description
- For the Jar name, click the browse button
- Find the PiEstimatorKrb-1.0.jar file you put in HDFS
- For "Main Class" enter "com.test.PiEstimatorKrb"
- For "Arguments" enter "<tempdir> <nMaps> <nSamples>" by replacing those with correct values. For example "/user/cconner/pi_temp 4 1000", base the nMaps and nSamples on what you would normally use for the Pi example.
- Click "add path" next to "Files" and search for PiEstimatorKrb-1.0.jar in HDFS.
- Click Done.
- Click Save.
- Click Submit on the left.
Here is an example not using a driver class:
Steps:
Put /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar in HDFS somewhere, I put it in /user/oozie and make it readable by everyone.
Create a directory in HDFS for the job. I did "hadoop fs -mkdir teragen_oozie"
Create an empty input directory in HDFS for the job. I did "hadoop fs -mkdir teragen_oozie/input"
Go into Hue->Oozie and click Create workflow.
Enter Name, Description, "HDFS deployment directory" and set it to the location above
Click Save
Click + button for Mapreduce
Enter a name for the MR task
For Jar name, browse to the location where you put hadoop-mapreduce-examples.jar above
Click "Add Property" for Job Properties and add the following:
mapred.input.dir = hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/input
mapred.output.dir = hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/output
mapred.mapper.class = org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper
terasort.num-rows = 500
Click "Add delete" for Prepare and specify "hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/output" as the location.
Click Save.
Now run the Workflow and it should succeed
NOTE: change cdh412-1.test.com:8020 to be the correct NN for your environment.
Hope this helps!
Created ‎02-05-2014 05:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Current workaround, don't use a full path for the jar file or don't put the 'Jar name' into lib, just put it one level up:"
This is fixed in CDH5
Created ‎02-04-2014 01:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here is an example if you are using a Driver class:
Steps:
- Pull the Source code for the new PiEstimator and compile with Maven. Requires Git, Maven and Java:
- git clone https://github.com/cmconner156/oozie_pi_load_test.git
- cd oozie_pi_load_test/PiEstimatorKrbSrc
- vi pom.xml
- set hadoop-core and hadoop-client to match your version.
- mvm clean install
- Copy oozie_pi_load_test/PiEstimatorKrbSrc/target/PiEstimatorKrb-1.0.jar to some location in HDFS. Make sure it's readable by whichever Hue user will run the workflow.
- Go to Hue browser and go to the Oozie app
- Go to the Workflows tab
- Click "Create"
- Enter a name and description
- Click Save
- Drag "Java" from the actions above to the slot between "Start" and "end"
- Give it a name and description
- For the Jar name, click the browse button
- Find the PiEstimatorKrb-1.0.jar file you put in HDFS
- For "Main Class" enter "com.test.PiEstimatorKrb"
- For "Arguments" enter "<tempdir> <nMaps> <nSamples>" by replacing those with correct values. For example "/user/cconner/pi_temp 4 1000", base the nMaps and nSamples on what you would normally use for the Pi example.
- Click "add path" next to "Files" and search for PiEstimatorKrb-1.0.jar in HDFS.
- Click Done.
- Click Save.
- Click Submit on the left.
Here is an example not using a driver class:
Steps:
Put /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar in HDFS somewhere, I put it in /user/oozie and make it readable by everyone.
Create a directory in HDFS for the job. I did "hadoop fs -mkdir teragen_oozie"
Create an empty input directory in HDFS for the job. I did "hadoop fs -mkdir teragen_oozie/input"
Go into Hue->Oozie and click Create workflow.
Enter Name, Description, "HDFS deployment directory" and set it to the location above
Click Save
Click + button for Mapreduce
Enter a name for the MR task
For Jar name, browse to the location where you put hadoop-mapreduce-examples.jar above
Click "Add Property" for Job Properties and add the following:
mapred.input.dir = hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/input
mapred.output.dir = hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/output
mapred.mapper.class = org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper
terasort.num-rows = 500
Click "Add delete" for Prepare and specify "hdfs://cdh412-1.test.com:8020/user/admin/teragen_oozie/output" as the location.
Click Save.
Now run the Workflow and it should succeed
NOTE: change cdh412-1.test.com:8020 to be the correct NN for your environment.
Hope this helps!
Created ‎02-05-2014 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Class not found for the class with the driver. I uploaded the jar into the workflow's workspace folder. Then the jar is defined in "jar" field and also added to the Files and Archive. The driver class is definitely in the jar. What a I doing wrong?
Created ‎02-05-2014 02:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am answering my own question here about the ClassNotFoundException. Since I have updated the jar into the workflow's workspace, when picking the jar for jar and files, the generated Workflow.xml show the path for the jar as relative to the workspace.
This does not seem to work.
I uploaded the same jar to a different hdfs location when the workspace. Generated the workflow.xml and see that the jar path is fully qualified. This time it worked.
But I'd thought the relative path should have worked as well. Anyhow, this is for other who run into the same issue.
Created ‎02-05-2014 05:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Current workaround, don't use a full path for the jar file or don't put the 'Jar name' into lib, just put it one level up:"
This is fixed in CDH5
Created ‎09-12-2017 12:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Partitioner is not invoked when used in oozie mapreduce action (Creating workflow using HUE). But works as expected when running using hadoop jar commad in CLI,
I have implemented secondary sort in mapreduce and trying to execute it using Oozie (From Hue).
Though I have set the partitioner class in the properties, the partitioner is not being executed. So, I'm not getting output as expected.
The same code runs fine when run using hadoop command.
And here is my workflow.xml
<workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5"> <start to="mapreduce-598d"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="mapreduce-598d"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.output.dir</name> <value>/test_1109_3</value> </property> <property> <name>mapred.input.dir</name> <value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value> </property> <property> <name>mapred.input.format.class</name> <value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value> </property> <property> <name>mapred.mapper.class</name> <value>PonRankMapper</value> </property> <property> <name>mapred.reducer.class</name> <value>PonRankReducer</value> </property> <property> <name>mapred.output.value.comparator.class</name> <value>PonRankGroupingComparator</value> </property> <property> <name>mapred.mapoutput.key.class</name> <value>PonRankPair</value> </property> <property> <name>mapred.mapoutput.value.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapred.reduce.output.key.class</name> <value>org.apache.hadoop.io.NullWritable</value> </property> <property> <name>mapred.reduce.output.value.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapred.reduce.tasks</name> <value>1</value> </property> <property> <name>mapred.partitioner.class</name> <value>PonRankPartitioner</value> </property> <property> <name>mapred.mapper.new-api</name> <value>False</value> </property> </configuration> </map-reduce> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/>
When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API.
Not sure why my partitioner is not executed when running using Oozie. Inspite of adding
<property> <name>mapred.partitioner.class</name> <value>PonRankPartitioner</value> </property>
