Support Questions
Find answers, ask questions, and share your expertise

cluster capacity planning

cluster capacity planning

I have read there is formula to calculate hadoop storage required as per the data existence, but I am not able to understand it fully .

formula :


where c = average compression ratio,r = replication factor,S = size of data to be moved to Hadoop and i = intermediate factor

It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phases.

how can we decide value of i , what factors we include for that and why its value is either 1/3 or 1/4 .