Is there a way that I can force the replica to catch up the leader? The replica has been out of sync for over 24 hrs. Tried restarting and i dont see any movement. Tried moving replica to a different brokers it does not work reassignment stuck . Created an additional replica and that command also stuck waiting for the out-of-sync to catch up to leader.
Unclean.leader.election is enabled in the cluster
ERROR kafka.server.ReplicaFetcherThread: [ReplicaFetcher replicaId=99, leaderId=157, fetcherId=0] Error due to
kafka.common.KafkaException: Error processing data for partition dev-raw-events-35 offset 111478948 Caused by: kafka.common.UnexpectedAppendOffsetException: Unexpected offset in append to dev-raw-events-35. First offset or last offset of the first batch 111478933 is less than the next offset 111478948. First 10 offsets in append: List(111478933, 111478934, 111478935, 111478936, 111478937, 111478938, 111478939, 111478940, 111478941, 111478942), last offset in append: 111479224. Log start offset = 95104666
Tried restarting the broker and the under-replicated partitions change
Tried moving to another node and it was uncessfull
Tried creating a new replica and kafka-reassign-partitions is stuck waiting for the out of sync to catch up
What can i do to fix this issue ?
I'm not sure there's a way to force it so sync here. From what you're describing and the error you shared, I think what's happening here is that the replica fetcher thread fails and the broker stops replicating data from the leader. That would explain why you see the broker out of sync for a long time.
Are you using Cloudera's distribution of Kafka or is this Apache Kafka?
What version are you using?
I see that someone reported a similar issue very recently:
It's a fairly new issue that I personally haven't seen before with any of the current customers running on the Cloudera Distribution of Kafka, but the latest versions released (Cloudera Distribution of Kafka 3.1.1) and Kafa in CDH 6.0 is based on Apache Kafka 1.0.1. The plan for CDH 6.1 is to rebase Cloudera Kafka to Apache Kafka 2.0, so it's probably just a matter of time till this becomes a more common issue.
You mentioned that restarting the Kafka service would then cause the problematic partitions to change. Is that the case when you only shutdown a single broker and start it up again? I'm asking because one potential way to work around this is to identify which broker is lagging behind and not joining the ISR, shutdown the broker, delete the topic partition data (for the affected partitions) from disk and then start up the broker again.
The broker will start and self heal by replicating all the data from the current leader of those partitions. Obviously this can take a long time depending on how many partitions are affected and how much data needs to be replicated.
I agree with your suggestion and we are in the process of testing this in staging. We unfortunately dont want to try this on the problematic cluster which is production as we might corrupt something.
its hard to replicate the issue in staging environment, atleast try to do rm -rf for one replica and restart broker and see how it would behave.
After doing some research this is the issue we are facing. https://issues.apache.org/jira/browse/KAFKA-6361.
Just to be clear, you're only deleting data for the specific partitions that are impacted and not everything under the broker's data directory. I just wasn't sure what you meant by rm -rf here so wanted to clarify.
Good luck, and please do let us know of the outcome.
Yes we only tried deleting the out-of-sync partition. It did not work.
After a lot of research we came to a conclusion to increase replica.lag.time.max.ms to 8 days. As its been around 8 days that a few replicas were out of sync.
This resolved our issue and while it took a few hours for followers to fetch and replicate the 7 days of data.
https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/ helped to understand the ISR's