Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

cluster capacity planning

Highlighted

cluster capacity planning

I have read there is formula to calculate hadoop storage required as per the data existence, but I am not able to understand it fully .

formula :

H=crS/(1-i)

where c = average compression ratio,r = replication factor,S = size of data to be moved to Hadoop and i = intermediate factor

It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phases.

how can we decide value of i , what factors we include for that and why its value is either 1/3 or 1/4 .

Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here