Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kafka active controllers failing

Kafka active controllers failing

Contributor

Hi,

 

I have installed Kafka in Cloudera and it has been pretty stable up until this week.  The first time the leader failed I added replicas to the NavigatorAuditEvents to contain 3 replicas as the default was 1.  It appears the __consumer_offsets partitions keep dropping out for some reason.  I can't figure out the regex process in the log search tool and can't get to the logs as they are in a secure location (/var/logs/).

I would like to know how to go about researching this problem and how to fire up a down active controller.  The following listing is the current state of the topics:

Topic:NavigatorAuditEvents PartitionCount:1 ReplicationFactor:3 Configs:
Topic: NavigatorAuditEvents Partition: 0 Leader: 358 Replicas: 358,357,359 Isr: 357,359,358
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 356 Replicas: 356,361,362 Isr: 356,362,361
Topic: __consumer_offsets Partition: 1 Leader: 357 Replicas: 357,362,356 Isr: 356,357,362
Topic: __consumer_offsets Partition: 2 Leader: 358 Replicas: 358,356,357 Isr: 356,358,357
Topic: __consumer_offsets Partition: 3 Leader: 359 Replicas: 359,357,358 Isr: 357,359,358
Topic: __consumer_offsets Partition: 4 Leader: 360 Replicas: 360,358,359 Isr: 359,358,360
Topic: __consumer_offsets Partition: 5 Leader: 361 Replicas: 361,359,360 Isr: 359,361,360
Topic: __consumer_offsets Partition: 6 Leader: 361 Replicas: 362,360,361 Isr: 361,360
Topic: __consumer_offsets Partition: 7 Leader: 356 Replicas: 356,362,357 Isr: 356,357,362
Topic: __consumer_offsets Partition: 8 Leader: 357 Replicas: 357,356,358 Isr: 356,358,357
Topic: __consumer_offsets Partition: 9 Leader: 358 Replicas: 358,357,359 Isr: 357,359,358
Topic: __consumer_offsets Partition: 10 Leader: 359 Replicas: 359,358,360 Isr: 360,358,359
Topic: __consumer_offsets Partition: 11 Leader: 360 Replicas: 360,359,361 Isr: 359,361,360
Topic: __consumer_offsets Partition: 12 Leader: -1 Replicas: 361,360,362 Isr: 362
Topic: __consumer_offsets Partition: 13 Leader: 356 Replicas: 362,361,356 Isr: 356,361
Topic: __consumer_offsets Partition: 14 Leader: 356 Replicas: 356,357,358 Isr: 356,358,357
Topic: __consumer_offsets Partition: 15 Leader: 357 Replicas: 357,358,359 Isr: 359,358,357
Topic: __consumer_offsets Partition: 16 Leader: 358 Replicas: 358,359,360 Isr: 360,358,359
Topic: __consumer_offsets Partition: 17 Leader: 359 Replicas: 359,360,361 Isr: 361,359,360
Topic: __consumer_offsets Partition: 18 Leader: 360 Replicas: 360,361,362 Isr: 361,360,362
Topic: __consumer_offsets Partition: 19 Leader: 361 Replicas: 361,362,356 Isr: 356,361,362
Topic: __consumer_offsets Partition: 20 Leader: 356 Replicas: 362,356,357 Isr: 356,357
Topic: __consumer_offsets Partition: 21 Leader: 356 Replicas: 356,358,359 Isr: 359,356,358
Topic: __consumer_offsets Partition: 22 Leader: 357 Replicas: 357,359,360 Isr: 359,357,360
Topic: __consumer_offsets Partition: 23 Leader: 358 Replicas: 358,360,361 Isr: 361,360,358
Topic: __consumer_offsets Partition: 24 Leader: 359 Replicas: 359,361,362 Isr: 359,361
Topic: __consumer_offsets Partition: 25 Leader: 360 Replicas: 360,362,356 Isr: 356,360,362
Topic: __consumer_offsets Partition: 26 Leader: 361 Replicas: 361,356,357 Isr: 356,361,357
Topic: __consumer_offsets Partition: 27 Leader: 358 Replicas: 362,357,358 Isr: 358,357
Topic: __consumer_offsets Partition: 28 Leader: 356 Replicas: 356,359,360 Isr: 356,359,360
Topic: __consumer_offsets Partition: 29 Leader: 357 Replicas: 357,360,361 Isr: 361,357,360
Topic: __consumer_offsets Partition: 30 Leader: -1 Replicas: 358,361,362 Isr: 362
Topic: __consumer_offsets Partition: 31 Leader: 359 Replicas: 359,362,356 Isr: 356,359
Topic: __consumer_offsets Partition: 32 Leader: 360 Replicas: 360,356,357 Isr: 356,357,360
Topic: __consumer_offsets Partition: 33 Leader: 361 Replicas: 361,357,358 Isr: 358,361,357
Topic: __consumer_offsets Partition: 34 Leader: 359 Replicas: 362,358,359 Isr: 359,358
Topic: __consumer_offsets Partition: 35 Leader: 356 Replicas: 356,360,361 Isr: 356,361,360
Topic: __consumer_offsets Partition: 36 Leader: 357 Replicas: 357,361,362 Isr: 361,357,362
Topic: __consumer_offsets Partition: 37 Leader: 358 Replicas: 358,362,356 Isr: 356,358
Topic: __consumer_offsets Partition: 38 Leader: 359 Replicas: 359,356,357 Isr: 356,359,357
Topic: __consumer_offsets Partition: 39 Leader: 360 Replicas: 360,357,358 Isr: 358,360,357
Topic: __consumer_offsets Partition: 40 Leader: 361 Replicas: 361,358,359 Isr: 359,358,361
Topic: __consumer_offsets Partition: 41 Leader: 359 Replicas: 362,359,360 Isr: 359,360
Topic: __consumer_offsets Partition: 42 Leader: 356 Replicas: 356,361,362 Isr: 356,362,361
Topic: __consumer_offsets Partition: 43 Leader: 357 Replicas: 357,362,356 Isr: 356,357,362
Topic: __consumer_offsets Partition: 44 Leader: 358 Replicas: 358,356,357 Isr: 356,358,357
Topic: __consumer_offsets Partition: 45 Leader: 359 Replicas: 359,357,358 Isr: 357,359,358
Topic: __consumer_offsets Partition: 46 Leader: 360 Replicas: 360,358,359 Isr: 359,358,360
Topic: __consumer_offsets Partition: 47 Leader: 361 Replicas: 361,359,360 Isr: 359,361,360
Topic: __consumer_offsets Partition: 48 Leader: 360 Replicas: 362,360,361 Isr: 360,361
Topic: __consumer_offsets Partition: 49 Leader: 356 Replicas: 356,362,357 Isr: 356,357,362

 

Also, found these errors:

[Controller id=356 epoch=41] Encountered error while electing leader for partition __consumer_offsets-12 due to: Preferred replica 361 for partition __consumer_offsets-12 is either not alive or not in the isr. Current leader and ISR: [{"leader":-1,"leader_epoch":32,"isr":[362]}]

[Controller id=356 epoch=41] Initiated state change for partition __consumer_offsets-12 from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: [Controller id=356 epoch=41] Encountered error while electing leader for partition __consumer_offsets-12 due to: Preferred replica 361 for partition __consumer_offsets-12 is either not alive or not in the isr. Current leader and ISR: [{"leader":-1,"leader_epoch":32,"isr":[362]}]
	at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:324)
	at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:163)
	at kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:110)
	at kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:109)
	at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
	at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:109)
	at kafka.controller.KafkaController.onPreferredReplicaElection(KafkaController.scala:632)
	at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3$$anonfun$apply$13.apply(KafkaController.scala:1189)
	at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3$$anonfun$apply$13.apply(KafkaController.scala:1182)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
	at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1182)
	at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1169)
	at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
	at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
	at kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerAutoLeaderRebalance(KafkaController.scala:1169)
	at kafka.controller.KafkaController$AutoPreferredReplicaLeaderElection$.process(KafkaController.scala:1449)
	at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
	at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
	at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
	at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
	at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
Caused by: kafka.common.StateChangeFailedException: Preferred replica 361 for partition __consumer_offsets-12 is either not alive or not in the isr. Current leader and ISR: [{"leader":-1,"leader_epoch":32,"isr":[362]}]
	at kafka.controller.PreferredReplicaPartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:157)
	at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:303)
	... 25 more

 

Any pointers would be greatly appreciated.  And, I'm relatively new to Kafka and don't really know what/how to ask.

 

Thanks!