Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to write a custom partitioner for a MapReduce job?

 
1 REPLY 1

Firstly lets understand why we need partitioning inMapReduceFramework:

As we know that Map task take inputsplit as input and produces key,value pair as output. This key-value pairs are then feed to reduce task. But before the reduce phase , one more phase know as partitioning phase runs. This phase partition the map output based on key and keeps the record of the same key into the same partitions.

Lets take an example of Employee Analysis:
We want to find the highest paid Female and male employee from the data set.
Data Set:
Name Age Dept Gender Salary
A 23 IT Male 35
B 35 Finance Female 50
C 29 IT Male 40

Considering two map tasks gives following <k,v> as output:

Map1 o/p:

Key Value
Gender Value

Male A 23 IT Male 35
Female B 35 Finance Female 50

Map2 o/p
Key Value
Gender Value

Male C 29 IT Male 40

So , lets understand how to implement custom partitioner:

Our custom partitioner will send all <K,V> by Gender Male to one partition and <K,V> with Female to other partition .

here is the code:

public static class MyPartitioner extends Partitioner<Text,Text>{
public int getPartition(Text key, Text value, int numReduceTasks){
if(numReduceTasks==0)
return 0;
if(key.equals(new Text("Male")) )
return 0;
if(key.equals(new Text("Female")))
return 1;
}
}

Here , the getPartition() will return 0 if the key is Male and 1 if key is Female.

We can check our output in two files:
part-r-0000 and part-r-0001.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.