Created 07-26-2018 01:32 AM
Cluster was kerberized. After disabling Kerberos for a fix and kerberizing again, Kafka topics go not get in sync:
Topic:ATLAS_HOOK  PartitionCount:1  ReplicationFactor:2  Configs:
  Topic: ATLAS_HOOK  Partition: 0  Leader: 1003  Replicas: 1001,1003  Isr: 1003
Topic:ATLAS_ENTITIES  PartitionCount:1  ReplicationFactor:2  Configs:
  Topic: ATLAS_ENTITIES  Partition: 0  Leader: -1  Replicas: 1001,1002  Isr: 1001
Controller logs shows the following error:
[2018-07-26 02:50:04,971] WARN Failed to parse the controller info as json. Probably this controller is still using the old format [null] to store the broker id in zookeeper (kafka.controller.KafkaController$)
[2018-07-26 02:50:04,972] ERROR [controller-event-thread]: Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
kafka.common.KafkaException: Failed to parse the controller info: null. This is neither the new or the old format.
  at kafka.controller.KafkaController$.parseControllerId(KafkaController.scala:147)
  at kafka.controller.KafkaController.getControllerID(KafkaController.scala:1198)
  at kafka.controller.KafkaController.elect(KafkaController.scala:1662)
  at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1581)
  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
  at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
  at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
Caused by: java.lang.NumberFormatException: null
  at java.lang.Integer.parseInt(Integer.java:542)
  at java.lang.Integer.parseInt(Integer.java:615)
  at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:273)
  at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
  ... 10 more
using /get in zkCli shows "null" as result.
Also tried to create topics using "kafka" as principal, but they do not get a leader.
Any clues?
Thank you!
Created 07-26-2018 02:56 AM
Answering to myself: It seems something happens in this rekerberization procedure (reproducible in two clusters) and /controller gets null. I deleted the znode and restarted the brokers. A new controller was elected and the replicas caught up into isrs. Hope this proves useful to someone.
Created 07-26-2018 02:56 AM
Answering to myself: It seems something happens in this rekerberization procedure (reproducible in two clusters) and /controller gets null. I deleted the znode and restarted the brokers. A new controller was elected and the replicas caught up into isrs. Hope this proves useful to someone.
Created 11-21-2019 04:32 AM
Another +1 from me for the response. Spend couple of hours investigating and comparing configuration with working cluster before I removed the path. The worst part is that I was not able to find any indication in kafka/zookeeper logs that there is something wrong.
Created 12-04-2018 12:09 PM
Thanks for your answer to yourself 🙂 It helped me after many many many hours of Kafka debugging.
BTW; I my case it was exactly the same scenario: Kerberos -> De-Kerberize -> Re-Kerberize
Thanks