Created 12-19-2016 04:55 PM
I am running a kafka cluster composed by 3 nodes. One of the nodes crashed and it has been behaving oddly since then...
The following does not return anything on the malfunctioning node:
kafka-topics.sh --describe --zookeeper mynode01:2181
However, querying the topics on the other nodes return the expected topics.
Another thing I saw is that zookeeper seems to be missing some directories: .
/zkCli.sh -server mynode01 [zk: localhost:2181(CONNECTED) 1] ls / [controller, zookeeper]
Whereas if I check any other node it comes back with:
[zk: localhost:2181(CONNECTED) 0] ls / [isr_change_notification, zookeeper, admin, consumers, config, controller, brokers]
The logs report the following entry:
Error for partition [myqueue-1,0] to broker 1:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
I tried a couple of things already to sort this out, with no joy:
Although the cluster seems to be able to treat this node as any other and switch the roles of leader/follower with no issues... it looks like it got out of sync at some point and is not able to recover itself.
Any idea? Thanks in advance
Created 12-20-2016 04:53 PM
I was able to solve the issue by stopping zookeeper and kafka services in the affected node and removing the snapshots available in zkdata and the associated transaction logs available in zklog directories.
After starting zookeeper back up on the the affected node, the znodes missing were re-synced back.
Thanks for the help provided 🙂
Created 12-19-2016 06:44 PM
It looks to me that the content in the zookeeper is not synchronized, as you can not get the updated information from zookeeper on that node. it might need to fix the zookeeper first.
Created 12-19-2016 06:51 PM
Hi Frank, Thanks for your reply. Yes, that makes sense... would you be able to suggest something I could try to re-sync zookeeper?
Created 12-19-2016 08:45 PM
Can you check with the zookeeper log, to see what messages it reported.
Created 12-20-2016 04:53 PM
@yeayu Have you solved the issue? If not, could you please share the zookeeper log, so that I can take a look at what is the problem might be.
Created 12-20-2016 04:53 PM
I was able to solve the issue by stopping zookeeper and kafka services in the affected node and removing the snapshots available in zkdata and the associated transaction logs available in zklog directories.
After starting zookeeper back up on the the affected node, the znodes missing were re-synced back.
Thanks for the help provided 🙂
Created 12-20-2016 04:54 PM
Great to hear that you solve the issue.