Support Questions

sharma_dukool13 · ‎05-26-2018

How do we calculate hadoop cluster size in our project?

bansal_himani13 · ‎05-26-2018

Below formula is used to calculate the cluster size of hadoop:

H=crs/(1-i)
Where c=average compression ratio. This depends upon the type of compression used and size of the data. When no compression is used, c value will be 1.
R=replication factor. It is set to 3 by default in production cluster.
S = size of data to be moved to Hadoop. This could be a combination of historical data and incremental data. The incremental data can be daily for example and projected over a period of time (3 years for example).
i = intermediate factor. It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phase.
Example: With no compression i.e. c=1, a replication factor of 3, an intermediate factor of .25=1/4 H= 13S/(1-1/4)=3S/(3/4)=4S With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.

Cloudera Community

Support Questions

How to calculate the Hadoop cluster size?