Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what need to consider when adding new 17 kafka machines to the cluster

avatar

hi all

we have ambari cluster with the following details ( HDP version - 2.6.4 )

128 datanode  machines

3 kafka machines

3 zookeeper server

3 master machines

we want to add 17 kafka machines to the cluster

so

what need to consider when adding new 17 kafka machines to the cluster?

is it possible to stay with 3 zookeeper server while adding 17 kafka's machines?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar

Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.

A few things to consider:

1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.

2. Get the best performance out of your current 3 node deployment by following best practices.

4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.

3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.

4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later

5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.

Best of luck on your Kafka journey!

View solution in original post

7 REPLIES 7

avatar

Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.

A few things to consider:

1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.

2. Get the best performance out of your current 3 node deployment by following best practices.

4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.

3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.

4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later

5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.

Best of luck on your Kafka journey!

avatar

I have another question - in case we have 20 kafka machines , and we have only 3 zoo servers , is it still good to installed the zookeepers on VM machine ? , or we need fysical machine

Michael-Bronson

avatar

@Michael Bronson Physical machines are better. ZooKeeper is very sensitive to disk read latency. Virtual machines are typically connected to a networked file system. Writes to the file system can be delayed if the network connecting the VM to the networked file appliance is busy. Also VMs on hypervisors can do other fancy tricks like migrating which can cause zookeeper to fail. If you do use VMs you will need to modify some of the connection timeouts and shut off VM migration. VMs should be hosted on different hypervisors.

avatar

also see jordan answer - https://community.hortonworks.com/questions/207947/why-kafka-should-be-en-even-number.html ( his last answer - he say that it will not be good idea to set only 3 zookepers )

Michael-Bronson

avatar

@Michael Bronson A 3 node ZK cluster can survive the loss of one ZK node.

avatar

I think also when we have only 3 zoo servers and one fail , because the split brain issue then the other two zookepers can fail , do you agree ?

Michael-Bronson

avatar

@Michael Bronson The page Designing a ZooKeeper explains that a 3 node cluster an survive 1 node failure. A 5 node cluster can survive 2 node failures.