Support Questions

Find answers, ask questions, and share your expertise

why kafka should be un-even number

avatar

we hear that kafka should be an odd number to avoid split-brain scenarios!

but can we get more info about this ?

why kafka should be odd number?

we want to create the following ambari cluster based on HDP version 2.6.5

master machines - 3

kafka machines - 17

worker machines - 160

Michael-Bronson
6 REPLIES 6

avatar
Super Collaborator

There is no such rule for Kafka Brokers.

Zookeeper should maintain a quorum or (n/2 + 1) total machines (of n) that agree on leader-election values and locks, that results in a total odd number to accommodate for hardware and network failure scenarios.

From "Kafka - The Definitive Guide", as well as Apache Zookeeper site, you generally will have negative side effects from having more than 5 or 7 Zookeeper servers total serving applications using it.

You should have more than 3 Zookeepers because if one goes down, you are only left with 2, which results in that "split brain". With 5 servers, two can go down, and you still have 2 servers + 1 available for the "tie breaker" vote. For 7, you can loose up to 4 zookeepers and still be good.

avatar

so just to summary what you said do you mean that we need min 5 zookeeper server for 17 kafka machines ?

or in other words

how many zookeeper you suggest for the following cluster nodes:

master machines - 3

kafka machines - 17

worker machines - 160

Michael-Bronson

avatar
Super Collaborator

@Michael Bronson - The terms "master/worker" don't really mean anything in Kafka terms.

17 Kafka brokers seems like a lot (we have about that many brokers in AWS handling about 2million messages per day), but yes, a minimum of 5 ZKs is encouraged to account for maintenance and hardware failure, as mentioned.

avatar

@Jordan Moore

what are the risks if we still use only 3 zookeeper servers with 17 kafka machines ?

Michael-Bronson

avatar
Super Collaborator

@Michael Bronson - Well, the obvious; Kafka Leader election would fail if only one Zookeeper stops responding. Your consumers and producers wouldn't be able to determine which topic partition should serve any requests.

Hardware fails for a variety of reasons, and it would be better if you converted two of the 160 available worker nodes to be dedicated Zookeeper servers.

avatar

@Michael Bronson ZooKeeper needs an odd number of hosts so it can build a quorum. A 3 node cluster can survive the loss of 1 node. It will fail if there is a simultaneous loss of 2 nodes (for example a node fails during an upgrade). If zookeeper goes down the brokers will not operate.

Designing a ZooKeeper deployment explains:

"For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines."