Support Questions

mike_bronson7 · ‎12-04-2018

we have old ambari cluster , HDP version 2.6.0.x ,

with 48 data-node machines

we want to add 160 new data-node

how to know what is the limitation ? ( what is the max data-node that should be in the cluster ? )

Michael-Bronson

jagadeesan · ‎12-04-2018

The Namenode stores metadata about the data being stored in datanodes whereas the datanode stores the actual Data. The Namenode will also require RAM directly proportional to the number of data blocks in the cluster. A good rule of thumb is to assume 1GB of namenode memory for every 1 million blocks stored in the distributed file system. With 100 DataNodes in a cluster, 64GB of RAM on the namenode provides plenty of room to grow the cluster. So, thousands of datanodes can be handled by a single namenode, but there are many factors to consider: namenode memory size, number of blocks to be stored, block replication factor, how will the cluster be used, etc. In short, “number of datanodes a single name node can handle depends on the size of the name node (How much metadata it can hold)”

Please accept the answer you found most useful

View solution in original post

jagadeesan · ‎12-04-2018

@Michael Bronson

The Namenode stores metadata about the data being stored in datanodes whereas the datanode stores the actual Data. The Namenode will also require RAM directly proportional to the number of data blocks in the cluster. A good rule of thumb is to assume 1GB of namenode memory for every 1 million blocks stored in the distributed file system. With 100 DataNodes in a cluster, 64GB of RAM on the namenode provides plenty of room to grow the cluster. So, thousands of datanodes can be handled by a single namenode, but there are many factors to consider: namenode memory size, number of blocks to be stored, block replication factor, how will the cluster be used, etc. In short, “number of datanodes a single name node can handle depends on the size of the name node (How much metadata it can hold)”

Please accept the answer you found most useful

mike_bronson7 · ‎12-04-2018

can I ask Little different question also ? , we want also to increase the kafka to 20 machines , while we have only 3 zookeepers servers , so when increasing the kafka machines what also need to be consider ?

Michael-Bronson

jagadeesan · ‎12-04-2018

@Michael Bronson

In normal small deployment using 3 zookeeper servers is acceptable, but keep in mind that you will only be able to tolerate 1 server down in this case. If you have a ZooKeeper ensemble has 5 or 7 servers, which tolerates 2 and 3 servers down, respectively. I hope this answers your question.

Reference: https://kafka.apache.org/documentation/#zk

mike_bronson7 · ‎12-04-2018

since our zookper servers are on VM machines for now , and after we increase the kafka to 20 , dose zookeper machine will work more hard? , if yes maybe need to move the zookper to physical machine with more resources?

Michael-Bronson

jagadeesan · ‎12-04-2018

@Michael Bronson

Zookeeper servers are tolerates the servers down. But yes it's always recommendable if you are in the planning for scaling the cluster go with more resources and robust hardwares. It’s completely perfect to move the Zookeeper servers from VM machines to physical machines with more resources.

mike_bronson7 · ‎12-04-2018

yes now the tolerates issue is clearly , but from your point what is your suggestion when 20 kafka machines are installed , what is the best practice here ? to be with 3 zoo server or 5 ?

Michael-Bronson

jagadeesan · ‎12-04-2018

I can suggest, for 20 kafka machines you can go with 3 zookeeper servers

Cloudera Community

Support Questions

datanode machine + how many datanode we can add to the cluster - what is the limitation