Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

how to decide howmany worker nodes we should have in a cluster ?

Expert Contributor


I want to create hadoop cluster. how should I decide the cluster size and its configuration ?

on what basis we decide cluster configuration like :

  1. Number of nodes we need
  2. RAM on each node
  3. how many master and slaves we need

Is there some formula to calculate ?


Super Collaborator

While I am not aware of any formula, there is at least a guide available:

In principal it says about 24 - 48 GB per data node. For the name node 64GB are supposed to deal with 100 million files.

Otherwise my recommendation would be to go for a real use (not test or demonstration) at least with 3 master nodes and 12 slave nodes, but increase the slave nodes as needed for your use. A typically use is 2GB RAM for one MR task, so that can provide a rule of thumb on how many slave node you should add.

To be more precise on the sizing, the expected use should be given, ie. will you just use MR, or maybe HBase, or do you need stream processing etc... The more applications running on the slaves, the more RAM you probably need besides the MR task, so it would result in additional nodes. It is also possible to have separated clusters for the stream processing and the hadoop storage.