Created 09-06-2016 09:34 PM
Should 3 be sufficient for a 3 rack cluster with one ZK per rack? Does increasing ZK nodes to 5 make sense? My understanding is that for fault tolerant 3 ZKs are good enough. Having 2 ZK nodes on the same rack doesn't increase HA.
Created 09-06-2016 10:23 PM
Well, this really depends on your tolerance for failure. Zookeeper requires a quorum of servers to be up at any time. It uses a majority quorum to make a decision. Zookeeper is up when ceil(N/2) servers are up where N are total number of servers in the quorum. For 3 node zookeeper, you can tolerate one failure. For 5 node zookeeper, you can tolerate up to 2 failures. the reason I would recommend 5 zookeeper nodes in your case is because you have a 100 node cluster. To make sure your business continuity and be confidently tolerate couple of failures, it's better to go with 5 zookeepers.
Also, think about planned maintenance. With five zookeepers, you can take one out for maintenance and still have a tolerance of one more failure. With three zookeepers, maintenance is also a challenge.
That being said, now that you know the implications of going with 3 vs 5 zookeepers, you can decide to go with three zookeepers knowing that in case of one zookeeper failure, you have limited window to bring the failed zookeeper up because one more zookeeper failure means risk to business.
Created 09-06-2016 10:23 PM
Well, this really depends on your tolerance for failure. Zookeeper requires a quorum of servers to be up at any time. It uses a majority quorum to make a decision. Zookeeper is up when ceil(N/2) servers are up where N are total number of servers in the quorum. For 3 node zookeeper, you can tolerate one failure. For 5 node zookeeper, you can tolerate up to 2 failures. the reason I would recommend 5 zookeeper nodes in your case is because you have a 100 node cluster. To make sure your business continuity and be confidently tolerate couple of failures, it's better to go with 5 zookeepers.
Also, think about planned maintenance. With five zookeepers, you can take one out for maintenance and still have a tolerance of one more failure. With three zookeepers, maintenance is also a challenge.
That being said, now that you know the implications of going with 3 vs 5 zookeepers, you can decide to go with three zookeepers knowing that in case of one zookeeper failure, you have limited window to bring the failed zookeeper up because one more zookeeper failure means risk to business.
Created 09-07-2016 09:20 PM
The more, the better, to some extent.
Created 05-14-2018 09:03 AM
What if I set up 4 zookeeper node? The quorum would be ceil (4/2) = 2, but shouldn't it be 3 (n+1)/2.
Created 09-07-2016 01:11 AM
@mqureshi recommendations are correct. If you have a good monitoring in place and you must have one, 3 zookeepers should be enough. If one fails, you would have a split brain. If you had five and one fails down you still have a split quorum. As you can see, 5 is better than 3 only if 2 fail at the same time which is unlikely. Otherwise, you must have real-time monitoring and recovery. I would add that while you can share zookeepers across multiple services in Data Platform, some organizations prefer to allocate zookeepers specific to their Kafka cluster. In that case you would have 3 zookeepers for Kafka and probably Storm since it is a quite common combo and 3 zookeepers for other services.
Anyhow:
- monitor the state of your zookeepers
- put in place an automated recovery
- use 5 zookeepers that makes you more comfortable than 3
If any of the responses to your question addressed the problem don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.