Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Contributor

When a Kafka cluster is over-subscribed, the loss of a single broker can be a jarring experience for the cluster as a whole. This is especially true when trying to bring a previously failed broker back into a cluster.

In order to help mitigate some of the impact of returning a broker to a cluster when that broker has been out of the cluster for a number of days, removing the broker ID of the broker ready to re-enter the cluster from the Replicas list of all partitions can help.

Generally, you want a Kafka cluster that is sized properly in order to handle single node failures, but as is often the case the size of the use case on the Kafka cluster can quickly start to exceed the physical limitations. In those situations when you're waiting for new hardware to arrive to augment your cluster, you still need to keep the existing cluster working as well as possible.

To that end, there are some AWK scripts that are available on Github that help create the JSON files needed to essentially spoon feed partitions back on to a broker.

This collection of script, which are playfully called Kawkfa, are still alpha at best and have their bugs, but someone may find them useful in the above situation.

The high level procedure is as follows:

  1. For each partition entry that includes the broker.id of the failed node, remove that broker ID from the Replicas list
  2. Bring the wayward broker back into the cluster
  3. Add back the wayward broker ID to the Replicas list, but do so without making it the preferred replica
  4. Once the broker had been added back to its partitions, then make the broker the preferred replica for a random number of the partitions

Caveats about the scripts:

  • You are using the scripts at your own risk. Just be careful and understand what the scripts are doing prior to use
  • There are bugs in the script -- most notable is that it adds an extra comma at the end of the last partition entry that should not be there. Simply removing that comma will allow the JSON file to be properly read
  • Have fun!
754 Views
Version history
Last update:
‎07-13-2017 02:58 PM
Updated by:
Contributors