Reply
New Contributor
Posts: 1
Registered: ‎10-08-2013

How to decide on the number of reducers and mappers in the cluster?

Hi

I am new toHadoop.

I would like to know the decision factors considered for setting up hadoop cluster for processing large volume of data.

I have read that the defalut block size is 64Mb or 128Mb. On what factor it was decied?

Number of mappers are decided by the number of input split, the size of slit being the blocksize.

Also I have seen several answers like number of reducers is directly proportional to number of reducer slots in the cluster, another

answer some mathematical calcualtion like 4 cores machine then you can have (0.75*no cores)no.  of MR slots. 

For example, if you have 4 physical cores OR 8 virtual cores then you can have 0.75*8=6 MR slots. You can then set 3M+3R or 4M+2R and so on as per your requirement, another ansewer 0.95*nodes*mapred.tasktracker.tasks.max ( some property).
I am confused with differennt answers.
Could some body help me in understadning the logic behind calulating the number of mappers/reducers.
 
For exampe if I have 10TB of data,  20 node cluster, with each node having 12 cpu cores of 2.4GHz each,50 TB hard disk and RAM 4GB.
What is the block size you would consider, number of  mappers and number of reducers in the cluster.
Posts: 416
Topics: 51
Kudos: 75
Solutions: 48
Registered: ‎06-26-2013

Re: How to decide on the number of reducers and mappers in the cluster?

@pruthvi I have moved this post to the MapReduce board since this is mostly a MR related question.  Hopefully somebody in here can help you.

 

Regards,

 

Clint

New Contributor
Posts: 2
Registered: ‎08-09-2017

Re: How to decide on the number of reducers and mappers in the cluster?

hi,

I am new to community and not sure where is MapReduce Board.. can someone help?

Posts: 565
Kudos: 62
Solutions: 32
Registered: ‎04-06-2015

Re: How to decide on the number of reducers and mappers in the cluster?

You can find it here

 




Cy Jervis, Community Moderator - I'm not an expert but will supply relevant content from time to time. :)

Learn more about the Cloudera Community:


Terms of Service


Community Guidelines


How to use the forum

New Contributor
Posts: 2
Registered: ‎08-09-2017

Re: How to decide on the number of reducers and mappers in the cluster?

Ah. So we already on right board. Thanks.

I am looking for answer to same question prithvi has. Can someone help?
Highlighted
Expert Contributor
Posts: 244
Registered: ‎01-25-2017

Re: How to decide on the number of reducers and mappers in the cluster?

@pruthvi 4GB memory is low memory for hadoop servers, also it's depend if you are running in incremental way on the data, regarding the mappers number you can used the CombinedInputFormat and then you can decide the number of input data for each mapper, reducers, regarding the mappers and reducers per node, if you are using cloudera, you should not care about this as you specificied of vcore to used per node, where it generaly to leave 1 core for the OS, so if you have a physical server with 2 cpu cores, you can specified the cores per node 11, if you are using vanilla hadoop then you should know your jobs better in order to decide the ratio between mappers and reducers,and always it's depend in your SLA.

 

 

Announcements