Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to change / configure number of Mappers ?

How can I change / configure number of Mappers ?

4 REPLIES 4

Cloudera Employee

Hi @Dukool SHarma

The number of map tasks for a given job is driven by the number of input splits. So, the number of map tasks is equal to the number of input splits. Split is logical split of the data, basically used during data processing using MapReduce program.

Suppose you have a file of 200MB and HDFS default block configuration is 128MB.Then it will consider two splits.
But if you have specified the split size(say 200MB) in your MapReduce program then both blocks(2 block) will be considered as a single split for the MapReduce processing and one Mapper will get assigned for this job.

If you want n number of Map, divide the file size by n as follows:
Parameters:
conf.set(“mapred.max.split.size”, “41943040”); // maximum split file size in bytes

conf.set(“mapred.min.split.size”, “20971520”); // minimum split file size in bytes.

Please accept my answer if it is found helpful.

Number of mappers always equals to the Number of input splits. We can control the number of splits by changing the mapred.min.split.size which controls the minimum input split size.

Assume the block size is 64 MB and mapred.min.split.size is set to 128 MB.
The size of InputSplit will be 128 MB even though the block size is 64 MB.

Mentor

@Dukool SHarma

Yes ,you can when executing from the command line by adding the -Dmapreduce parameter see below

bin/hadoop jar -Dmapreduce.job.maps=5 yourapp.jar ...

HTH

Mentor

@Dukool SHarma

Any updates?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.