Created 08-20-2024 02:37 PM
we have cluster with 12 Kafka machines and 3 zookeeper servers on Linux servers
- Kafka version is 2.7 version , ( broker and controller are co-hosted on the same PID )
as known Kafka have 2 important logs and they are **server.log** and **controller.log**
about **controller.log** , when we look on this log we can see the following words - "`Shutdown completed`" in the log
[2024-08-20 21:42:01,582] INFO [ControllerEventThread controllerId=1001] Shutdown completed (kafka.controller.ControllerEventManager$ControllerEventThread)
the first thinking when we see the messages about "`Shutdown completed`" - is like this message is "bad" and why controller stopped ...
but when we look on all machines most of the machines have this message as - `Shutdown completed` (`kafka.controller.ControllerEventManager$ControllerEventThread`)
**but on other hand** only one controller should be active from all brokers and maybe the messages as "`Shutdown completed`" are only indicate that controllers that are not active are in standby state and therefore are in state of - `Shutdown completed` ?
for example - here one of the log from one broker machine
[2024-08-20 21:23:18,084] DEBUG [Controller id=1001] Broker 1007 was elected as controller instead of broker 1001 (kafka.controller.KafkaController)
org.apache.kafka.common.errors.ControllerMovedException: Controller moved to another broker. Aborting controller startup procedure
[2024-08-20 21:33:51,281] DEBUG [Controller id=1001] Broker 1005 was elected as controller instead of broker 1001 (kafka.controller.KafkaController)
org.apache.kafka.common.errors.ControllerMovedException: Controller moved to another broker. Aborting controller startup procedure
[2024-08-20 21:42:01,581] INFO [ControllerEventThread controllerId=1001] Shutting down (kafka.controller.ControllerEventManager$ControllerEventThread)
[2024-08-20 21:42:01,582] INFO [ControllerEventThread controllerId=1001] Shutdown completed (kafka.controller.ControllerEventManager$ControllerEventThread)
[2024-08-20 21:42:01,582] INFO [ControllerEventThread controllerId=1001] Stopped (kafka.controller.ControllerEventManager$ControllerEventThread)
[2024-08-20 21:42:01,582] DEBUG [Controller id=1001] Resigning (kafka.controller.KafkaController)
[2024-08-20 21:42:01,583] DEBUG [Controller id=1001] Unregister BrokerModifications handler for Set() (kafka.controller.KafkaController)
[2024-08-20 21:42:01,604] INFO [PartitionStateMachine controllerId=1001] Stopped partition state machine (kafka.controller.ZkPartitionStateMachine)
[2024-08-20 21:42:01,608] INFO [ReplicaStateMachine controllerId=1001] Stopped replica state machine (kafka.controller.ZkReplicaStateMachine)
[2024-08-20 21:42:01,608] INFO [Controller id=1001] Resigned (kafka.controller.KafkaController)
[2024-08-20 21:43:45,196] INFO [ControllerEventThread controllerId=1001] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2024-08-20 21:43:45,208] DEBUG [Controller id=1001] Broker 1005 has been elected as the controller, so stopping the election process. (kafka.controller.KafkaController)
[2024-08-20 21:52:28,400] DEBUG [Controller id=1001] Broker 1001 was elected as controller instead of broker 1001 (kafka.controller.KafkaController)
org.apache.kafka.common.errors.ControllerMovedException: Controller moved to another broker. Aborting controller startup procedure
<---- LOG IS ENDED HERE
so the question is - can we ignore the messages as `INFO [ControllerEventThread controllerId=1001] Shutdown completed (kafka.controller.ControllerEventManager$ControllerEventThread)`
of maybe something is wrong with the Kafka controller ?
Created 11-15-2024 11:50 PM
@mike_bronson7 these messages indicates that the Controller is being elected on other node and the current broker who is acting as a controller initiating a clean shutdown to handover the controller responsibilities to the newly elected controller. These messages are normal, but if you are facing frequent controller failures / elections / controller switches to other nodes then this could be a concern.
As long as if this is happening for a genuine reason such as Cluster restart / controller broker restarts. This is valid. Also, if the Controller broker disconnects from Zookeeper and looses the Controller Znode then the other brokers participate on controller election, that would also trigger these messages