Created 12-03-2018 07:15 PM
hi all
we have ambari cluster with the following details ( HDP version - 2.6.4 )
128 datanode machines 3 kafka machines 3 zookeeper server 3 master machines
we want to add 17 kafka machines to the cluster
so
what need to consider when adding new 17 kafka machines to the cluster?
is it possible to stay with 3 zookeeper server while adding 17 kafka's machines?
Created 12-03-2018 09:33 PM
Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.
A few things to consider:
1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.
2. Get the best performance out of your current 3 node deployment by following best practices.
4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.
3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.
4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later
5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.
Best of luck on your Kafka journey!
Created 12-03-2018 09:33 PM
Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster.
A few things to consider:
1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum.
2. Get the best performance out of your current 3 node deployment by following best practices.
4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected.
3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7.
4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later
5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above.
Best of luck on your Kafka journey!
Created 12-03-2018 09:40 PM
I have another question - in case we have 20 kafka machines , and we have only 3 zoo servers , is it still good to installed the zookeepers on VM machine ? , or we need fysical machine
Created 12-04-2018 07:21 PM
@Michael Bronson Physical machines are better. ZooKeeper is very sensitive to disk read latency. Virtual machines are typically connected to a networked file system. Writes to the file system can be delayed if the network connecting the VM to the networked file appliance is busy. Also VMs on hypervisors can do other fancy tricks like migrating which can cause zookeeper to fail. If you do use VMs you will need to modify some of the connection timeouts and shut off VM migration. VMs should be hosted on different hypervisors.
Created 12-03-2018 09:49 PM
also see jordan answer - https://community.hortonworks.com/questions/207947/why-kafka-should-be-en-even-number.html ( his last answer - he say that it will not be good idea to set only 3 zookepers )
Created 12-04-2018 07:21 PM
@Michael Bronson A 3 node ZK cluster can survive the loss of one ZK node.
Created 12-03-2018 10:21 PM
I think also when we have only 3 zoo servers and one fail , because the split brain issue then the other two zookepers can fail , do you agree ?
Created 12-04-2018 07:25 PM
@Michael Bronson The page Designing a ZooKeeper explains that a 3 node cluster an survive 1 node failure. A 5 node cluster can survive 2 node failures.