Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

How to decide on the number of reducers and mappers in the cluster?

avatar
New Contributor

Hi

I am new toHadoop.

I would like to know the decision factors considered for setting up hadoop cluster for processing large volume of data.

I have read that the defalut block size is 64Mb or 128Mb. On what factor it was decied?

Number of mappers are decided by the number of input split, the size of slit being the blocksize.

Also I have seen several answers like number of reducers is directly proportional to number of reducer slots in the cluster, another

answer some mathematical calcualtion like 4 cores machine then you can have (0.75*no cores)no.  of MR slots. 

For example, if you have 4 physical cores OR 8 virtual cores then you can have 0.75*8=6 MR slots. You can then set 3M+3R or 4M+2R and so on as per your requirement, another ansewer 0.95*nodes*mapred.tasktracker.tasks.max ( some property).
I am confused with differennt answers.
Could some body help me in understadning the logic behind calulating the number of mappers/reducers.
 
For exampe if I have 10TB of data,  20 node cluster, with each node having 12 cpu cores of 2.4GHz each,50 TB hard disk and RAM 4GB.
What is the block size you would consider, number of  mappers and number of reducers in the cluster.
Who agreed with this topic