Support Questions

Find answers, ask questions, and share your expertise

kafka + Leader none + and kafka broker id not signed in zookeeper

avatar

we have 3 Kafka brokers on Linux RHEL 7.6 ( 3 linux machines )

kafka version is 2.7.X

brokers ID's are - `1010,1011,1012`


from kafka described we can see the following

Topic: __consumer_offsets Partition: 0 Leader: none Replicas: 1011,1010,1012 Isr: 1010
Topic: __consumer_offsets Partition: 1 Leader: 1012 Replicas: 1012,1011,1010 Isr: 1012,1011
Topic: __consumer_offsets Partition: 2 Leader: 1011 Replicas: 1010,1012,1011 Isr: 1011,1012
Topic: __consumer_offsets Partition: 3 Leader: none Replicas: 1011,1012,1010 Isr: 1010
Topic: __consumer_offsets Partition: 4 Leader: 1011 Replicas: 1012,1010,1011 Isr: 1011
Topic: __consumer_offsets Partition: 5 Leader: none Replicas: 1010,1011,1012 Isr: 1010


from Zookeeper cli we can see that broker `id 1010` not defined

[zk: localhost:2181(CONNECTED) 10] ls /brokers/ids
[1011, 1012]

and from the log - `state-change.log`

we can see the following

[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-6 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-9 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-8 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-11 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-10 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-46 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-45 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-48 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-47 as the local replica for the partition is in an offline log directory (state.change.logger)
[2021-12-16 14:15:36,170] WARN [Broker id=1010] Ignoring LeaderAndIsr request from controller 1010 with correlation id 485 epoch 323 for partition __consumer_offsets-49 as the local replica for the partition is in an offline log directory (state.change.logger)


by ls -ltr , we can see that `controller.log` and `state-change.log` are not update from `Dec 16`

-rwxr-xr-x 1 root kafka 343477146 Dec 16 14:15 controller.log
-rwxr-xr-x 1 root kafka 207911766 Dec 16 14:15 state-change.log
-rw-r--r-- 1 root kafka 68759461 Dec 16 14:15 kafkaServer-gc.log.6.current
-rwxr-xr-x 1 root kafka 6570543 Dec 17 09:42 log-cleaner.log
-rw-r--r-- 1 root kafka 524288242 Dec 20 00:39 server.log.10
-rw-r--r-- 1 root kafka 524289332 Dec 20 01:37 server.log.9
-rw-r--r-- 1 root kafka 524288452 Dec 20 02:35 server.log.8
-rw-r--r-- 1 root kafka 524288625 Dec 20 03:33 server.log.7
-rw-r--r-- 1 root kafka 524288395 Dec 20 04:30 server.log.6
-rw-r--r-- 1 root kafka 524288237 Dec 20 05:27 server.log.5
-rw-r--r-- 1 root kafka 524289136 Dec 20 06:25 server.log.4
-rw-r--r-- 1 root kafka 524288142 Dec 20 07:25 server.log.3
-rw-r--r-- 1 root kafka 524288187 Dec 20 08:21 server.log.2
-rw-r--r-- 1 root kafka 524288094 Dec 20 10:52 server.log.1
-rw-r--r-- 1 root kafka 323361 Dec 20 19:50 kafkaServer-gc.log.0.current
-rw-r--r-- 1 root kafka 323132219 Dec 20 19:50 server.log
-rwxr-xr-x 1 root kafka 15669106 Dec 20 19:50 kafkaServer.out


what we did until now is that:

we restart all 3 zookeeper servers
we restart all kafka brokers

but still kafka broker `1010` appears as `leader none` , and not in zookeeper data

 


**additional info**

[zk: localhost:2181(CONNECTED) 11] get /controller
{"version":1,"brokerid":1011,"timestamp":"1640003679634"}
cZxid = 0x4900000b0c
ctime = Mon Dec 20 12:34:39 UTC 2021
mZxid = 0x4900000b0c
mtime = Mon Dec 20 12:34:39 UTC 2021
pZxid = 0x4900000b0c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x27dd7cf43350080
dataLength = 57
numChildren = 0


**from kafka01**

more meta.properties
#
#Tue Nov 16 07:45:36 UTC 2021
cluster.id=D3KpekCETmaNveBJzE6PZg
version=0
broker.id=1010

**relevant ideas**

in topics disk we have the following files ( additionally to topics partitions )


-rw-r--r-- 1 root kafka 91 Nov 16 07:45 meta.properties
-rw-r--r-- 1 root kafka 161 Dec 15 16:04 cleaner-offset-checkpoint
-rw-r--r-- 1 root kafka 13010 Dec 15 16:20 replication-offset-checkpoint
-rw-r--r-- 1 root kafka 1928 Dec 17 09:42 recovery-point-offset-checkpoint
-rw-r--r-- 1 root kafka 80 Dec 17 09:42 log-start-offset-checkpoint

any idea if deletion of one or more of above files can help with our issue?

Michael-Bronson
1 REPLY 1

avatar
Expert Contributor

Hi @mike_bronson7 

 

1. Do you see anything interesting from the broker 1010 log file? this is to try to understand why 1010 is not able to register in zookeeper.

 

2. Try forcing a new controller by using:

[zk: localhost:2181(CONNECTED) 11] rmr /controller

 

3. Are these broker ids unique? if you describe other topics, do you see the same brokers ids and same behavior (leader none for some partitions)?

 

4. Finally, if this is dev env:

 4.1 You can enable unclean leader election = true and restart the brokers 

 Or:

 4.2 (if this happening just for this topic) remove __consumer_offsets topic (just from zookeeper) and restart kafka