Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what need to consider when adding new 17 kafka machines to the cluster

Solved Go to solution

what need to consider when adding new 17 kafka machines to the cluster

hi all

we have ambari cluster with the following details ( HDP version - 2.6.4 )

128 datanode  machines

3 kafka machines

3 zookeeper server

3 master machines

we want to add 17 kafka machines to the cluster

so

what need to consider when adding new 17 kafka machines to the cluster?

is it possible to stay with 3 zookeeper server while adding 17 kafka's machines?

Michael-Bronson
1 ACCEPTED SOLUTION

Accepted Solutions

Re: what need to consider when adding new 17 kafka machines to the cluster

Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.

A few things to consider:

1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.

2. Get the best performance out of your current 3 node deployment by following best practices.

4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.

3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.

4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later

5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.

Best of luck on your Kafka journey!

7 REPLIES 7

Re: what need to consider when adding new 17 kafka machines to the cluster

Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.

A few things to consider:

1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.

2. Get the best performance out of your current 3 node deployment by following best practices.

4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.

3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.

4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later

5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.

Best of luck on your Kafka journey!

Re: what need to consider when adding new 17 kafka machines to the cluster

I have another question - in case we have 20 kafka machines , and we have only 3 zoo servers , is it still good to installed the zookeepers on VM machine ? , or we need fysical machine

Michael-Bronson

Re: what need to consider when adding new 17 kafka machines to the cluster

@Michael Bronson Physical machines are better. ZooKeeper is very sensitive to disk read latency. Virtual machines are typically connected to a networked file system. Writes to the file system can be delayed if the network connecting the VM to the networked file appliance is busy. Also VMs on hypervisors can do other fancy tricks like migrating which can cause zookeeper to fail. If you do use VMs you will need to modify some of the connection timeouts and shut off VM migration. VMs should be hosted on different hypervisors.

Re: what need to consider when adding new 17 kafka machines to the cluster

also see jordan answer - https://community.hortonworks.com/questions/207947/why-kafka-should-be-en-even-number.html ( his last answer - he say that it will not be good idea to set only 3 zookepers )

Michael-Bronson

Re: what need to consider when adding new 17 kafka machines to the cluster

@Michael Bronson A 3 node ZK cluster can survive the loss of one ZK node.

Re: what need to consider when adding new 17 kafka machines to the cluster

I think also when we have only 3 zoo servers and one fail , because the split brain issue then the other two zookepers can fail , do you agree ?

Michael-Bronson

Re: what need to consider when adding new 17 kafka machines to the cluster

@Michael Bronson The page Designing a ZooKeeper explains that a 3 node cluster an survive 1 node failure. A 5 node cluster can survive 2 node failures.

Don't have an account?
Coming from Hortonworks? Activate your account here