How can I change / configure number of Mappers ?
Hi @Dukool SHarma
The number of map tasks for a given job is driven by the number of input splits. So, the number of map tasks is equal to the number of input splits. Split is logical split of the data, basically used during data processing using MapReduce program.
Suppose you have a file of 200MB and HDFS default block configuration is 128MB.Then it will consider two splits.
But if you have specified the split size(say 200MB) in your MapReduce program then both blocks(2 block) will be considered as a single split for the MapReduce processing and one Mapper will get assigned for this job.
If you want n number of Map, divide the file size by n as follows:
conf.set(“mapred.max.split.size”, “41943040”); // maximum split file size in bytes
conf.set(“mapred.min.split.size”, “20971520”); // minimum split file size in bytes.
Please accept my answer if it is found helpful.
Number of mappers always equals to the Number of input splits. We can control the number of splits by changing the mapred.min.split.size which controls the minimum input split size.
Assume the block size is 64 MB and mapred.min.split.size is set to 128 MB.
The size of InputSplit will be 128 MB even though the block size is 64 MB.