Member since
07-27-2021
1
Post
0
Kudos Received
0
Solutions
05-27-2025
04:08 AM
Cruise Control is an open-source solution to automate the dynamic workload rebalance and self-healing of large-scale Apache Kafka clusters. With the widespread adoption of Apache Kafka, many companies now maintain Kafka clusters with hundreds or even thousands of brokers. Operating large-scale clusters may lead to frequent broker failures and a substantial burden in redistributing workload. Cruise Control was designed to solve these operational scalability challenges. What is Cloudera Streams Messaging - Kubernetes Operator? Cloudera Streams Messaging - Kubernetes Operator brings enterprise-grade Kafka deployments to existing Kubernetes infrastructures. Cloudera Streams Messaging - Kubernetes Operator allows flexible, agile, and rapid deployment as well as scaling for variable workloads. This solution enables the deployment of Kafka and related components on existing, shared Kubernetes infrastructure, eliminating the need for dedicated infrastructure Cruise Control is also part of this product and it enables features such as manual and automatic cluster scaling with workload rebalance, when adding and removing brokers or changing topic replica values. Cruise Control is also able to supply information about cluster load, can monitor the cluster for anomalies, and perform (limited) automated self-healing actions. Currently, not all Cruise Control capabilities are available in a Kubernetes environment, but the number of supported features is increasing with each release. What is ZooKeeper? Apache ZooKeeper is a centralized coordination service designed to manage and facilitate critical functions necessary for distributed systems. These functionalities include maintaining configuration data, enabling reliable naming mechanisms, ensuring distributed synchronization, and supporting group membership services. Each of these features is foundational to the proper operation of distributed applications, which often rely on such mechanisms to maintain consistency, manage shared resources, and coordinate actions across different nodes in a distributed environment. When developers attempt to build these capabilities independently within their applications, they face significant engineering challenges. Implementing these features correctly requires addressing complex issues such as race conditions, deadlocks, and consistency violations. Moreover, it is an advantage if heterogeneous implementations can be avoided by a standard one. ZooKeeper addresses these challenges by offering a standardized solution that simplifies the development and deployment of distributed systems, thereby enhancing their resilience, manageability, and scalability. ZooKeeper played a mandatory role in the operation of Apache Kafka (and, by extension, Cruise Control) by providing essential distributed coordination services. It is a critical component for Kafka’s high availability, fault tolerance, and system consistency. ZooKeeper facilitates cluster management by enabling broker registration, monitoring cluster membership, and orchestrating controller election to ensure that only one broker serves as the active controller at a time. It also disseminates cluster event notifications, allowing Kafka to respond dynamically to changes such as broker failures. In addition to broker coordination, ZooKeeper supports Kafka consumers by managing group coordination, partition assignment, offset storage and consumer metadata. It ensures efficient partition rebalancing and tracks consumer progress, contributing to robust and fault-tolerant data stream consumption. Kafka relies on ZooKeeper for key operational tasks including metadata storage, leader election using ephemeral ZNodes and dynamic configuration of topics. It also supports access control mechanisms via ACLs (Access Control Lists) and enforces client-specific resource quotas, such as bandwidth and request rates. Together, Kafka and ZooKeeper formed a tightly integrated system that ensures scalability, resiliency, and consistent state management in distributed messaging environments. Life without ZooKeeper Using a separate component besides the Kafka cluster creates additional maintenance and operational requirements, increasing the complexity of the environment. Additionally, ZooKeeper has become a bottleneck that limits the amount of partitions that a single broker can handle. KRaft (Kafka Raft) is an event-driven implementation of the Raft protocol, featuring a quorum-based controller that manages an event log. It uses a single-partition topic called “__cluster_metadata” to store cluster metadata. Operating in a leader-follower model, the leader writes events to the metadata topic, which are then replicated to follower controllers via the KRaft replication algorithm. The leader of this single-partition topic also serves as the controller node for the Kafka cluster. Apache Kafka has officially deprecated ZooKeeper in version 3.5. In contrast, the early access of KRaft mode is available from 2.8, while the migration scripts were also published starting with the 3.4 version of Kafka. Cruise Control has been able to run without ZooKeeper since 2022. Version 2.5.88 of Cruise Control officially supports this behaviour. Furthermore, starting with Cloudera Streams Messaging - Kubernetes Operator version 1.3.0, which includes Cruise Control version 2.5.141.1.3.0 (a Cloudera fork based on Cruise Control version 2.5.141), no additional configuration changes are required, as all ZooKeeper-related features and dependencies have been removed. Cruise Control was built to use ZooKeeper for metadata related queries and as a backup store for failed brokers and similar information. These uses of ZooKeeper are not critical since all of the metadata information is also available through KafkaAdmin API queries. There is no direct or primary need for ZooKeeper from any use cases of Cruise Control. With the removal of ZooKeeper, starting up Cruise Control became simpler, since there's no need to wait for an external service. This also means Cruise Control has less parts to fail and it is more robust in this new shape. Cruise Control integration tests were heavily dependent on ZooKeeper, since all Kafka clusters were installed with ZooKeeper as the metadata store in the past. With KRaft now being generally available in Kafka, Cloudera has migrated the ZooKeeper-based tests to KRaft and has contributed this enhancement back to the community. From a user point of view there is no difference between running Cruise Control with ZooKeeper or without it. The only contrast from user point of view is the less required configuration, which is handled by the Strimzi Cluster Operator in Kubernetes-based environments, but all of the endpoints, self-healing or any other features are working as before. Additionally, life without ZooKeeper does not contain bottlenecks anymore, since KRaft nodes have similar limitations as normal Kafka broker nodes. ZooKeeper related features in Cruise Control before 1.3.0 release In previous versions of Cruise Control, there was an option to detect broker failures using ZooKeeper. In contrast, the default behaviour used Kafka for this purpose, and there is no real advantage or reason to use ZooKeeper-based detection. This was a legacy feature. With the help of "failed.brokers.zk.path", Cruise Control provided a deprecated solution to use ZooKeeper as a backup store for failed brokers. The "KafkaTopicConfigProvider" was a deprecated feature which was replaced by a Kafka Admin Client-based solution: "KafkaAdminTopicConfigProvider" class. This can be used as the value for the "topic.config.provider.class" property (“provider class that reports the active configuration of topics”), which is also the default value to report the active configuration of topics. There were many "zookeeper.*" prefixed configurations which were passed to ZooKeeper, but they are no longer needed. Cruise Control with a KRaft-based Kafka cluster does not require any KRaft specific configurations, as the latest version of this application does not include any metadata store-related features. Cruise Control integration tests used ZooKeeper-based Kafka clusters. Cloudera Streams Messaging - Kubernetes Operator is based on Strimzi and the Strimzi Cluster Operator does not allow to modify any of the previously mentioned properties so there are no changes in Cruise Control which could affect the users! Why is removing ZooKeeper from Cruise Control necessary? The 1.3.0 release of Cloudera Streams Messaging - Kubernetes Operator uses KRaft as the metadata store by default for Kafka clusters. Cloudera Streams Messaging - Kubernetes Operator 1.3.0 supports Kafka 3.9.0 beside other versions, which is the last minor version of Kafka that includes support for ZooKeeper. Starting with Kafka 4.0, support for ZooKeeper is fully removed. The 3.9.0.1.3.0 Cloudera version of Kafka is the first version with general availability of KRaft mode (“ZooKeeper-less mode”). There will be no support for running in ZooKeeper mode or migrating from ZooKeeper-based clusters in future releases. ZooKeeper is deprecated in the 1.3.0 release and will be removed in a future release. Therefore, removing ZooKeeper from Cruise Control made sense, not only to future-proof the product and reduce complexity but also to prepare for the removal of all the ZooKeeper dependencies in the future. ZooKeeper was entirely removed from Cruise Control in the 2.5.141.1.3.0 version, which is the Cloudera fork of Cruise Control based 2.5.141. Summary Cruise Control is the tool to manage large-scale Apache Kafka clusters. The next chapter of Kafka was started with the 4.0 release in March 2025. In this new version, ZooKeeper is no longer supported, so Kafka-related tools like Cruise Control cannot use ZooKeeper as a source of information anymore. In this article, we covered the Cruise Control changes related to the removal of ZooKeeper, including the elimination of old configurations, the deprecation of certain features, and the migration of integration tests to KRaft-based Kafka clusters. We also described deprecated Cruise Control features and the challenges faced during this process. Interested in joining Cloudera? At Cloudera, we are working on fine-tuning big data related software bundles (based on Apache open-source projects) to provide our customers a seamless experience while they are running their analytics or machine learning projects on petabyte-scale datasets. Start your 5-day trial for Cloudera and explore our streaming data distribution. If you are interested in enterprise data management, would like to know more about Cloudera, or are just open to a discussion with techies, visit our Budapest office at our upcoming meetups. Resources View Product Release Notes for Cloudera Streams Messaging - Kubernetes Operator Documentation New - What’s New in Cloudera Streams Messaging - Kubernetes Operator 1.3 Cloudera Stream Processing Product Page Cloudera Kubernetes Operators documentation homepage Accelerate Streaming Pipeline Deployments with New Kubernetes Operators (webinar recording)
... View more
Labels: