Created 07-25-2018 04:35 PM
we hear that kafka should be an odd number to avoid split-brain scenarios!
but can we get more info about this ?
why kafka should be odd number?
we want to create the following ambari cluster based on HDP version 2.6.5
master machines - 3
kafka machines - 17
worker machines - 160
Created 07-25-2018 06:58 PM
There is no such rule for Kafka Brokers.
Zookeeper should maintain a quorum or (n/2 + 1) total machines (of n) that agree on leader-election values and locks, that results in a total odd number to accommodate for hardware and network failure scenarios.
From "Kafka - The Definitive Guide", as well as Apache Zookeeper site, you generally will have negative side effects from having more than 5 or 7 Zookeeper servers total serving applications using it.
You should have more than 3 Zookeepers because if one goes down, you are only left with 2, which results in that "split brain". With 5 servers, two can go down, and you still have 2 servers + 1 available for the "tie breaker" vote. For 7, you can loose up to 4 zookeepers and still be good.
Created 07-25-2018 07:37 PM
so just to summary what you said do you mean that we need min 5 zookeeper server for 17 kafka machines ?
or in other words
how many zookeeper you suggest for the following cluster nodes:
master machines - 3
kafka machines - 17
worker machines - 160
Created 07-30-2018 06:53 PM
@Michael Bronson - The terms "master/worker" don't really mean anything in Kafka terms.
17 Kafka brokers seems like a lot (we have about that many brokers in AWS handling about 2million messages per day), but yes, a minimum of 5 ZKs is encouraged to account for maintenance and hardware failure, as mentioned.
Created 07-31-2018 02:22 PM
what are the risks if we still use only 3 zookeeper servers with 17 kafka machines ?
Created 07-31-2018 10:26 PM
@Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests.
Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.
Created 12-04-2018 07:16 PM
@Michael Bronson ZooKeeper needs an odd number of hosts so it can build a quorum. A 3 node cluster can survive the loss of 1 node. It will fail if there is a simultaneous loss of 2 nodes (for example a node fails during an upgrade). If zookeeper goes down the brokers will not operate.
Designing a ZooKeeper deployment explains:
"For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines."