Support Questions

heta_desai · ‎01-29-2018

Hi,

I want to create hadoop cluster. how should I decide the cluster size and its configuration ?

on what basis we decide cluster configuration like :

Number of nodes we need
RAM on each node
how many master and slaves we need

Is there some formula to calculate ?

arald · ‎01-29-2018

While I am not aware of any formula, there is at least a guide available:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cluster-planning/index.html

In principal it says about 24 - 48 GB per data node. For the name node 64GB are supposed to deal with 100 million files.

Otherwise my recommendation would be to go for a real use (not test or demonstration) at least with 3 master nodes and 12 slave nodes, but increase the slave nodes as needed for your use. A typically use is 2GB RAM for one MR task, so that can provide a rule of thumb on how many slave node you should add.

To be more precise on the sizing, the expected use should be given, ie. will you just use MR, or maybe HBase, or do you need stream processing etc... The more applications running on the slaves, the more RAM you probably need besides the MR task, so it would result in additional nodes. It is also possible to have separated clusters for the stream processing and the hadoop storage.

Cloudera Community

Support Questions

how to decide howmany worker nodes we should have in a cluster ?