Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can we change no of Mappers for a MapReduce job?

avatar
Contributor

How can we change no of Mappers for a MapReduce job?

4 REPLIES 4

avatar
@rinu shrivastav

No, The number of map tasks for a given job is driven by the number of input splits. For each input split a map task is spawned. So, we cannot directly change the number of mappers using a config other than changing the number of input splits.

avatar
Rising Star

\r\n dfs.block.size>\r\n 134217728\r\n \r\n"}" data-sheets-userformat="{"2":769,"3":[null,0],"11":4,"12":0}">Directly we cannot change the number of mappers for a MapReduce job but by changing the block size we can increase or decrease the number of mappers.


As we know
Number of input splits = Number of mappers

Example
If we are having 1TB of input file and the block size for the HDFS is 128MB then number of input splits are (1024/128) 8 input splits so the mappers for the job allotted are 8.

If we reduce the block size from 128MB to 64Mb then 1TB of Input file will be divided in to (1024/64) 16 Input splits and the number of mappers also be 16.

The block size can be changed in hdfs-site.xml by changing the value of dfs.block.size

<property>
<name>dfs.block.size>
<value>134217728</value>
</property>

avatar
Master Mentor

@rinu shrivastav

If you would want to have fixed number of reducer at runtime, you can do it while passing the Map/Reduce job at the command line. Using “-D mapred.reduce.tasks” with the desired number will spawn that many reducers at runtime. The number of Mappers for a MapReduce job is driven by number of input splits. And input splits are dependent upon the Block size. For eg If we have 500MB of data and 128MB is the block size in hdfs , then approximately the number of mapper will be equal to 4 mappers.

When you are running an hadoop job on the CLI you can use the -D switch to change the default of mappers and reducers can be settings like (5 mappers, 2 reducers):

-D mapred.map.tasks=5 -D mapred.reduce.tasks=2

Example

bin/hadoop jar -Dmapreduce.job.maps=5 yourapp.jar

HTH

avatar

@rinu shrivastav

The split size is calculated by the formula:-

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

Say, HDFS block size is 64 MB and min.input.size is set to 128MB, then there will be split size would be 128MB. To read 256MB of data, there will be two mappers. To increase the number of mappers, then you could decrease min.input.size till the HDFS block size.

split size=max(128,min(256,64))