Created on 05-22-2019 04:51 PM - edited 09-16-2022 07:24 AM
This is a partition move from one volume to another in a single broker.
I have started a kafka-reaasign-partitions of a partition from one volume to another but it never completes. Has anyone faced this ?
The directories look like the below. Here i am moving a partition from sdc to sdf volume. inodes also look fine.
desind@xxx:~#> sudo du -sh /kafka/data/sdc/prod-events-48
76G /kafka/data/sdc/prod-events-48
desind@xxx:~#> sudo du -sh /kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future
76G /kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future
desind@xxx:~#> find /kafka/data -type d -name 'prod-events-48*'
/kafka/data/sdf/prod-events-48.b82f63b489554ef4b2e2f4c514bc1bc0-future
/kafka/data/sdc/prod-events-48
Created 11-27-2019 08:22 AM
Thats wierd.
So in the log.dirs specify any,any,any and see if that works
Example
{
"partitions": [{
"topic": "foo",
"partition": 1,
"replicas": [1, 2, 3],
"log_dirs": ["any", "any", "any"]
}],
"version": 1
}
This is what documentation says "Broker will cancel existing movement of the replica if "any" is specified as destination log directory."
Created 06-05-2019 08:35 AM
Created on 06-05-2019 09:57 AM - edited 06-05-2019 09:59 AM
I am using kafka 2.0
I am moving a partition that is around 76G . You can see the size from my original post.
kafka-reassign-partitions --zookeeper xxxx:2181/kafka --execute --reassignment-json-file reassign.json --bootstrap-server xxxx:9092
Created 06-05-2019 04:27 PM
Can you share the output of verify command for reassign partitions.
kafka-reassign-partitions --zookeeper xxxx:2181/kafka --verify --reassignment-json-file reassign.json --bootstrap-server xxxx:9092
Created 06-05-2019 05:52 PM
Its been quitesome time that i ran that and dont have the session details. However i think the output would say "reassingment of replica still in progress" .
Reassignment of partition prod-events-45 completed successfully
Reassignment of replica prod-events-45-98 in progress
Reassignment of replica prod-events-45-154 completed successfully
Reassignment of replica prod-events-45-157 completed successfully
I tested this in staging cluster when there are no incoming messages, it works fine.
In production this topic events were being written at 3000 events/sec and the move did not complete. I was able to somehow stop the reassingment without causing any issues to paritions
Created on 06-17-2019 12:04 PM - edited 06-17-2019 12:04 PM
I just retried it without the --throttle option and it works fine and i was able to sucessfully move two partitions across volumes in same broker. Summary is to use it without throttling.
I initially tried with throttle and realized it unable to complete the move as the messages are also coming in at a steady pace. i also tried increasing and removing the throttle completely and it did not complete the move. Maybe something is going on with throttle.
Created 11-26-2019 06:07 PM
Can you please let me know whether the issue resolved?Am also facing the same issue.Reassignment is still running since 2 days. I ran the reassignment without throttling.
Also after this am seeing old logs files still exists and retention.ms and retention.bytes is not working
Created 11-26-2019 06:17 PM
Do you have the original assignment ?
You can stop it by passing the original assignment. It will just revert back. I did this couple of times and it works.
I read it in this KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+d...
Test in staging and then you can implement in prod.
Created 11-26-2019 06:35 PM
Before Reassignment the replicas are as below for Topic XXX
Topic: XXXX Partition: 1 Leader: 3 Replicas: 3,0,1
Original Reassignment json file:
{"version":1,"partitions":[{"topic":"XXXXX","partition":1,"replicas":[3,0,8],"log_dirs":["/data1/kafka","/data3/kafka","/data1/kafka"]}]}
For 3 node am moving the partition 1 data from data2 to data1 and for 0 node am moving the partition 1 data from data1 to data3 and for node 8 it is new node
Status of Verify as below
Status of partition reassignment:
Reassignment of partition XXXX-1 completed successfully
Reassignment of replica XXXX-1-3 is still in progress
Reassignment of replica XXXX-1-0 is still in progress
Reassignment of replica XXXX-1-8 completed successfully
I didnt set any values which you mentioned. The problem now here is old logs are not getting deleted and in the logs as seeing error as below
INFO [Partition XXXX-1 broker=3] XXXX-1 starts at Leader Epoch 101 from offset 10647332982. Previous Leader Epoch was: 100 (kafka.cluster.Partition)
INFO [Partition XXXX-tapsampleapp-1 broker=3] XXXX-tapsampleapp-1 starts at Leader Epoch 99 from offset 555229. Previous Leader Epoch was: 98 (kafka.cluster.Partition)
INFO [ReplicaAlterLogDirsThread-1]: Partition XXXX-1 has an older epoch (100) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaAlterLogDirsThread)
Intially it created one future folder and tried to do reasignment and then stopped suddenly.
Now my main concern is old logs are not getting deleted and file system is getting full
Created on 11-26-2019 06:38 PM - edited 11-26-2019 06:45 PM
To stop the move run the reassignment again with --execute option using the original
{"version":1,"partitions":[{"topic":"XXXXX","partition":1,"replicas":[3,0,8],"log_dirs":["/data1/kafka","/data3/kafka","/data1/kafka"]}]}
Then what you can do is instead of moving it on the same broker try to move the partition to a different broker