Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

How to calculate the Hadoop cluster size?

avatar

How do we calculate hadoop cluster size in our project?

1 REPLY 1

avatar

Below formula is used to calculate the cluster size of hadoop:

H=crs/(1-i)
Where c=average compression ratio. This depends upon the type of compression used and size of the data. When no compression is used, c value will be 1.
R=replication factor. It is set to 3 by default in production cluster.
S = size of data to be moved to Hadoop. This could be a combination of historical data and incremental data. The incremental data can be daily for example and projected over a period of time (3 years for example).
i = intermediate factor. It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phase.
Example: With no compression i.e. c=1, a replication factor of 3, an intermediate factor of .25=1/4 H= 13S/(1-1/4)=3S/(3/4)=4S With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.