Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reassignment of a replica across Kafka volumes in a JBOD never completes

avatar
Expert Contributor

This is a partition move from one volume to another in a single broker. 

 

I have started a kafka-reaasign-partitions of a partition from one volume to another but it never completes. Has anyone faced this ? 

  • Kafka cluster is in a healthy state 
  • Controller is fine 
  • source and destination is also fine 
  • The replica is not a leader. 
  • So far havent seen any errors in logs 

 

The directories look like the below. Here i am moving a partition from sdc to sdf volume. inodes also look fine. 


desind@xxx:~#> sudo du -sh /kafka/data/sdc/prod-events-48
76G /kafka/data/sdc/prod-events-48

 

desind@xxx:~#> sudo du -sh /kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future
76G /kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future


desind@xxx:~#> find /kafka/data -type d -name 'prod-events-48*'
/kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future
/kafka/data/sdc/prod-events-48

 

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Thats wierd. 

So in the log.dirs specify any,any,any  and see if that works 

Example 

{
"partitions": [{
"topic": "foo",
"partition": 1,
"replicas": [1, 2, 3],
"log_dirs": ["any", "any", "any"]
}],
"version": 1
}

 

This is what documentation says "Broker will cancel existing movement of the replica if "any" is specified as destination log directory."

View solution in original post

18 REPLIES 18

avatar
Explorer

I ran the reassignment previously using the below command with the json which i shared earlier. Do you want to run the same once again?

 

/kafka-reassign-partitions.sh --execute --bootstrap-server XXXX:9092 --reassignment-json-file ./test.json --zookeeper XXXXX:2181

 

As this is production i want to confirm once again

avatar
Expert Contributor

Yes. What we are doing here is that we are stopping the reassignment that is not progressing. And to do that you run it with the original json file and it will stop and revert back. 

avatar
Explorer

Hi, i was able to replicate the same issue in Test environment. I did rerun the --execute command for the same json file but still the same.Nothing changed after re-executing the command once again.

Any other suggestions?

avatar
Expert Contributor

Thats wierd. 

So in the log.dirs specify any,any,any  and see if that works 

Example 

{
"partitions": [{
"topic": "foo",
"partition": 1,
"replicas": [1, 2, 3],
"log_dirs": ["any", "any", "any"]
}],
"version": 1
}

 

This is what documentation says "Broker will cancel existing movement of the replica if "any" is specified as destination log directory."

avatar
Explorer

With ANY it worked. And also i need to restart the kafka service to delete the old segment logs.

Thanks very much for your support

avatar
Explorer

I implemented the same thing in Production and reassignment is showing as completed. But old segmented logs and future folder is not getting deleted.

From the logs i can see as below

 INFO [ReplicaAlterLogDirsThread-1]: Partition XXXX-1 has an older epoch (102) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaAlterLogDirsThread)
INFO [Partition XXX-1 broker=3] XXX-1 starts at Leader Epoch 106 from offset 0. Previous Leader Epoch was: 105 (kafka.cluster.Partition)

 

After restarting the service, it tried to copy the data from one folder to other folder for some time and then stopped

avatar
Expert Contributor

Now that the move has stopped/completed. 

Verify if there are no under-replicated OR offline partitions and also verify if there are 3 replicas and all are in sync (depends on your replication factor)

You can delete those partitions manually and just restart the broker. 

If replicas are out of sync they must come back in sync if unclean.leader.election is true 

avatar
Explorer

Now all is fine. After deleting the folders and restarting the service the issue is resolved.

Thanks very much for the support

avatar

we had the some issue, and I tried this, it seems not work.