Hi I am new toHadoop. I would like to know the decision factors considered for setting up hadoop cluster for processing large volume of data. I have read that the defalut block size is 64Mb or 128Mb. On what factor it was decied? Number of mappers are decided by the number of input split, the size of slit being the blocksize. Also I have seen several answers like number of reducers is directly proportional to number of reducer slots in the cluster, another answer some mathematical calcualtion like 4 cores machine then you can have (0.75*no cores)no. of MR slots. For example, if you have 4 physical cores OR 8 virtual cores then you can have 0.75*8=6 MR slots. You can then set 3M+3R or 4M+2R and so on as per your requirement, another ansewer 0.95*nodes*mapred.tasktracker.tasks.max ( some property). I am confused with differennt answers. Could some body help me in understadning the logic behind calulating the number of mappers/reducers. For exampe if I have 10TB of data, 20 node cluster, with each node having 12 cpu cores of 2.4GHz each,50 TB hard disk and RAM 4GB. What is the block size you would consider, number of mappers and number of reducers in the cluster.
... View more