I am new toHadoop.
I would like to know the decision factors considered for setting up hadoop cluster for processing large volume of data.
I have read that the defalut block size is 64Mb or 128Mb. On what factor it was decied?
Number of mappers are decided by the number of input split, the size of slit being the blocksize.
Also I have seen several answers like number of reducers is directly proportional to number of reducer slots in the cluster, another
answer some mathematical calcualtion like 4 cores machine then you can have (0.75*no cores)no. of MR slots.
@pruthvi 4GB memory is low memory for hadoop servers, also it's depend if you are running in incremental way on the data, regarding the mappers number you can used the CombinedInputFormat and then you can decide the number of input data for each mapper, reducers, regarding the mappers and reducers per node, if you are using cloudera, you should not care about this as you specificied of vcore to used per node, where it generaly to leave 1 core for the OS, so if you have a physical server with 2 cpu cores, you can specified the cores per node 11, if you are using vanilla hadoop then you should know your jobs better in order to decide the ratio between mappers and reducers,and always it's depend in your SLA.