Created on 09-26-2022 12:05 AM - last edited on 09-26-2022 01:17 AM by VidyaSargur
I enabled cruise control for Kafka cluster to perform self healing.
My expectation is get a notification whenever any action performed by cruise control, otherwise couldn't understand if CC is in action or not. Is there any provision?
Created 09-28-2022 02:41 AM
- You should check whether the CC endpoints are working as expected or not.
- Automatic rebalancing is DISABLED by default but it can be enabled by adding self.healing.enabled=true in cruisecontrol.properties advance snippet on CM UI. You can enable self-healing for specific types of events like broker failures, disk failures, metric anomalies or goal violations, etc.
More properties you can check at the below link.
And below are the Rest-APIs you can use on CC without enabling self.healing as well.
If you found this response assisted with your query, please take a moment to log in and click on KUDOS 🙂 & ”Accept as Solution" below this post.
Created 09-28-2022 03:17 AM
Self healing is enabled already. When any broker goes down, CC is rebalancing partitions to healthy nodes but when that node comes back up, redistribution is not happening automatically. I have to run the rebalance API manually.
Created 09-28-2022 11:56 AM
Please let us know what type of rebalancing you are looking for when a broker comes online.
Is it disk / complete partitions rebalancing from other brokers to this broker (which comes online)
OR only leader rebalance.
Created 09-28-2022 12:45 PM
I am expecting to rebalance leaders and replicas. In the example below when broker id 1111 went down, CC self healed(distributed leaders/partitions automatically) to other nodes. When 1111 node came back, distribution not happening automatically.
BROKER LEADER(S) REPLICAS OUT-OF-SYNC OFFLINE IS_CONTROLLER
1234 36 118 0 0 false
5678 44 123 0 0 false
1111 0 0 0 0 false
3333 44 126 0 0 false
4444 44 126 0 0 true
Created 10-07-2022 08:37 AM
It looks like you have the case open for the same issue from the Cloudera ticket system and we are replicating and investigating this issue on priority. We will share a workaround in the case and here in this community article as well so that it can help other users if they will come across the same issue.