Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

kafka machines in the cluster and kafka comunications

avatar

We have kafka cluster with 3 kafka brokers nodes and 3 zookperes servers

kafka version - 10.1 ( hortonworks )

from my understanding since all meta data is located on the zookeeper servers , and kafka brokers are using this data ( kafka talk with zookeeper server via port 2181 )

I just wondering if each kafka machine talk with other kafka in the cluster , or maybe kafka are get/put the data only on/from the zookeepers servers ?

So dose kafka service need to communicate with other kafka in the cluster ? , Or maybe kafka machines get all is need only from the zookeepers server ?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Apache Kafka uses Zookeeper to select a controller, Zookeeper tracks the status of Kafka cluster nodes and also plays a vital role for serving many other purposes, such as leader detection, configuration management, synchronization, detecting when a new node joins or leaves the cluster, etc.and maintain cluster membership by storing configuration, including the list of topics in the cluster.

In order to remain part of the Kafka cluster, each broker has to send keep-alive to Zookeeper in regular intervals. This is something every Zookeeper client does by default. If the broker doesn't heartbeat Zookeeper every zookeeper.session.timeout.ms milliseconds (6000 by default), Zookeeper will assume the broker is dead. This will cause leader election for all partitions that had a leader on that broker. If this broker happened to be the controller, you will also see a new controller elected.

In a Kafka cluster, service discovery helps the brokers find each other and know who’s in the cluster; and consensus helps the brokers elect a cluster controller, know what partitions exist, where they are, if they’re a leader of a partition or if they’re a follower and need to replicate, and so on.

A controller is not too complex it is a normal broker that simply has an additional responsibility. That means it still leads partitions, has writes/reads going through it and replicates data. The most important part of that additional responsibility is keeping track of nodes in the cluster and appropriately handling nodes that leave, join or fail. This includes rebalancing partitions and assigning new partition leaders.

There is always exactly one controller broker in a Kafka cluster.

HTH

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Michael Bronson

Apache Kafka uses Zookeeper to select a controller, Zookeeper tracks the status of Kafka cluster nodes and also plays a vital role for serving many other purposes, such as leader detection, configuration management, synchronization, detecting when a new node joins or leaves the cluster, etc.and maintain cluster membership by storing configuration, including the list of topics in the cluster.

In order to remain part of the Kafka cluster, each broker has to send keep-alive to Zookeeper in regular intervals. This is something every Zookeeper client does by default. If the broker doesn't heartbeat Zookeeper every zookeeper.session.timeout.ms milliseconds (6000 by default), Zookeeper will assume the broker is dead. This will cause leader election for all partitions that had a leader on that broker. If this broker happened to be the controller, you will also see a new controller elected.

In a Kafka cluster, service discovery helps the brokers find each other and know who’s in the cluster; and consensus helps the brokers elect a cluster controller, know what partitions exist, where they are, if they’re a leader of a partition or if they’re a follower and need to replicate, and so on.

A controller is not too complex it is a normal broker that simply has an additional responsibility. That means it still leads partitions, has writes/reads going through it and replicates data. The most important part of that additional responsibility is keeping track of nodes in the cluster and appropriately handling nodes that leave, join or fail. This includes rebalancing partitions and assigning new partition leaders.

There is always exactly one controller broker in a Kafka cluster.

HTH

avatar

@Geoffrey - thank you for the excellent answer , little question , could you please help me with this thread - https://community.hortonworks.com/questions/239890/kafka-what-could-be-the-reasons-for-kafka-broker-...

Michael-Bronson