Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

No controller for Kafka cluster after kerberos

avatar
New Contributor

Cluster was kerberized. After disabling Kerberos for a fix and kerberizing again, Kafka topics go not get in sync:

Topic:ATLAS_HOOK PartitionCount:1 ReplicationFactor:2 Configs:
Topic: ATLAS_HOOK Partition: 0 Leader: 1003 Replicas: 1001,1003 Isr: 1003

Topic:ATLAS_ENTITIES PartitionCount:1 ReplicationFactor:2 Configs:
Topic: ATLAS_ENTITIES Partition: 0 Leader: -1 Replicas: 1001,1002 Isr: 1001

Controller logs shows the following error:

[2018-07-26 02:50:04,971] WARN Failed to parse the controller info as json. Probably this controller is still using the old format [null] to store the broker id in zookeeper (kafka.controller.KafkaController$)
[2018-07-26 02:50:04,972] ERROR [controller-event-thread]: Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
kafka.common.KafkaException: Failed to parse the controller info: null. This is neither the new or the old format.
at kafka.controller.KafkaController$.parseControllerId(KafkaController.scala:147)
at kafka.controller.KafkaController.getControllerID(KafkaController.scala:1198)
at kafka.controller.KafkaController.elect(KafkaController.scala:1662)
at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1581)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:273)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
... 10 more

using /get in zkCli shows "null" as result.

Also tried to create topics using "kafka" as principal, but they do not get a leader.

Any clues?

Thank you!

1 ACCEPTED SOLUTION

avatar
New Contributor

Answering to myself: It seems something happens in this rekerberization procedure (reproducible in two clusters) and /controller gets null. I deleted the znode and restarted the brokers. A new controller was elected and the replicas caught up into isrs. Hope this proves useful to someone.

View solution in original post

3 REPLIES 3

avatar
New Contributor

Answering to myself: It seems something happens in this rekerberization procedure (reproducible in two clusters) and /controller gets null. I deleted the znode and restarted the brokers. A new controller was elected and the replicas caught up into isrs. Hope this proves useful to someone.

avatar
New Contributor

Another +1 from me for  the response. Spend couple of hours investigating and comparing configuration with working cluster before I removed the path. The worst part is that I was not able to find any indication in kafka/zookeeper logs that there is something wrong. 

avatar
Super Collaborator

@Ricardo Junior

Thanks for your answer to yourself 🙂 It helped me after many many many hours of Kafka debugging.

BTW; I my case it was exactly the same scenario: Kerberos -> De-Kerberize -> Re-Kerberize

Thanks