I have read there is formula to calculate hadoop storage required as per the data existence, but I am not able to understand it fully .
where c = average compression ratio,r = replication factor,S = size of data to be moved to Hadoop and i = intermediate factor
It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phases.
how can we decide value of i , what factors we include for that and why its value is either 1/3 or 1/4 .