About sharma_dukool13

bansal_himani13 · ‎05-26-2018

Below formula is used to calculate the cluster size of hadoop: H=crs/(1-i) Where c=average compression ratio. This depends upon the type of compression used and size of the data. When no compression is used, c value will be 1. R=replication factor. It is set to 3 by default in production cluster. S = size of data to be moved to Hadoop. This could be a combination of historical data and incremental data. The incremental data can be daily for example and projected over a period of time (3 years for example). i = intermediate factor. It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phase. Example: With no compression i.e. c=1, a replication factor of 3, an intermediate factor of .25=1/4 H= 13S/(1-1/4)=3S/(3/4)=4S With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.

Online	Offline
Last Visited	‎05-26-2018 08:53 AM

Member Since	‎05-26-2018 08:53 AM
Last Visited	‎05-26-2018 08:53 AM
Posts	1
Kudos received	1

Cloudera Community

Re: How to calculate the Hadoop cluster size?