Member since
09-07-2017
10
Posts
1
Kudos Received
0
Solutions
09-13-2017
01:48 PM
@Joseph Niemiec @Artem Ervits @narasimha meruva @Predrag Minovic Partitioner is not invoked when used in oozie mapreduce action (Creating workflow using HUE). But works as expected when running using hadoop jar commad in CLI.
I have implemented secondary sort in mapreduce and trying to execute it using Oozie (From Hue).
Though I have set the partitioner class in the properties, the partitioner is not being executed. So, I'm not getting output as expected.
The same code runs fine when run using hadoop command.
And here is my workflow.xml
<workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5">
<start to="mapreduce-598d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mapreduce-598d">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.output.dir</name>
<value>/test_1109_3</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>PonRankMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>PonRankReducer</value>
</property>
<property>
<name>mapred.output.value.comparator.class</name>
<value>PonRankGroupingComparator</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>PonRankPair</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapred.reduce.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>False</value>
</property>
</configuration>
</map-reduce>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API.
Not sure why my partitioner is not executed when running using Oozie. Inspite of adding
<property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>
, @Joseph Niemiec @Artem Ervits
Partitioner is not invoked when used in oozie mapreduce action (Creating workflow using HUE). But works as expected when running using hadoop jar commad in CLI,
I have implemented secondary sort in mapreduce and trying to execute it using Oozie (From Hue).
Though I have set the partitioner class in the workflow.xml, the partitioner is not being executed. So, I'm not getting output as expected.
The same code runs fine when run using hadoop jar command from CLI. Using the below property to set partitioner in oozie workflow.xml : <property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property> Mappers and reducers are invoked properly : <property>
<name>mapred.mapper.class</name>
<value>PonRankMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>PonRankReducer</value>
</property>
... View more
09-12-2017
12:09 PM
Partitioner is not invoked when used in oozie mapreduce action (Creating workflow using HUE). But works as expected when running using hadoop jar commad in CLI, I have implemented secondary sort in mapreduce and trying to execute it using Oozie (From Hue). Though I have set the partitioner class in the properties, the partitioner is not being executed. So, I'm not getting output as expected. The same code runs fine when run using hadoop command. And here is my workflow.xml <workflow-app name="MyTriplets" xmlns="uri:oozie:workflow:0.5">
<start to="mapreduce-598d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="mapreduce-598d">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.output.dir</name>
<value>/test_1109_3</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0609_tx/day=06-09-2017/hour=13/quarter=2/,/apps/hive/warehouse/7360_0509_util/day=05-09-2017/hour=16/quarter=1/</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.hadoop.hive.ql.io.RCFileInputFormat</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>PonRankMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>PonRankReducer</value>
</property>
<property>
<name>mapred.output.value.comparator.class</name>
<value>PonRankGroupingComparator</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>PonRankPair</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapred.reduce.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>False</value>
</property>
</configuration>
</map-reduce>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/> When running using hadoop jar command, I set the partitioner class using JobConf.setPartitionerClass API. Not sure why my partitioner is not executed when running using Oozie. Inspite of adding <property>
<name>mapred.partitioner.class</name>
<value>PonRankPartitioner</value>
</property>
... View more
09-08-2017
01:16 AM
I was able to make the job run by adding hive-exec jar in HADOOP_CLASSPATH as well as adding the jar in distributed cache. Can you throw some light as to why do we need to export the jar to classpath and also add in distributed cache.
... View more
09-07-2017
11:24 PM
Hi, I'm facing a similar issue with RCFileInputFormat. Im executing a simple code to read from a RCFile in mapper (usind RCFileInputFormat) and doing an aggregation on reducer side. A able to compile the code. But, while running facing ClassNotFoundException for Class org.apache.hadoop.hive.ql.io.RCFileInputFormat. Tried adding the jar in hadoop classpath but no luck. The below is the StackTrace. --> hadoop jar MRJobRCFile.jar MRJobRCFile /apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/ /test_9 java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.RCFileInputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649) at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:620) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.RCFileInputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java Should I investigate from jobconf.xml . If so, what do i need to check ?
... View more