Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[ANNOUNCEMENT] Second release of Cloudera Labs Kafka Integration

[ANNOUNCEMENT] Second release of Cloudera Labs Kafka Integration

Contributor

We're pleased to announce the second release of Cloudera Labs Kafka Integration.  This releases includes Kafka version 0.8.2-beta with several notable new features:


New producer: The new Kafka producer added in Kafka 0.8.2-beta combines the best features of the existing sync and async producers. Send requests are batched, allowing the new producer to perform as well as the async producer under load. And every send requests returns a future response object that can be used to retrieve status and exceptions. Note that the new producer API is still evolving, and may change in future releases.

Delete topic: Kafka now supports deletion of existing topic, including all data and replicas. This can be done through the kafka-topic admin tool.

Offset management: In previous versions, consumers that wanted to keep track of which messages were consumed, did so by updating the offset of the last consumed message in Zookeeper. This new feature allows using Kafka itself to keep track of the offsets. Using this feature can significantly improve consumer performance.

Automatic leader rebalancing: Each partition starts with a specific randomly selected leader replica that handles requests for this partition. When a cluster first starts, the leader are evenly balanced between nodes. But after a broker restart, leaders from that broker are distributed to other brokers, which leads to unbalanced distribution. With this feature enabled, following restarts leadership is assigned back to the original replica.

Connection quotas: Kafka administrators can limit the number of connections allowed from a single IP. By default, we limit it to 10 connections per IP. This feature prevents a misbehaving client from destabilizing a Kafka broker by opening very large number of connections and using all available file handles.
 
Here's how to get started:

  1. Download the Kafka-Cloudera Labs CSD.
  2. Install the CSD into Cloudera Manager using these instructions. Installation of the CSD will add a new parcel repository to your Cloudera Manager configuration (the CSD can be installed only on parcel-deployed clusters).
  3. Download, distribute, and activate the Kafka parcel, following the instructions here. After you activate the Kafka parcel, Cloudera Manager prompts you to restart the cluster. Click the Close button to ignore this prompt. You do not need to restart the cluster after installing Kafka.
  4. Add the Kafka service to your cluster, following the instructions here.
  5. See the full installation/configuration guide here.
  6. Leave feedback or ask questions on the community forums.

This version of Apache Kafka integration is not integrated into CDH or Cloudera Manager and is only available from Cloudera Labs. Cloudera Labs is a virtual container for innovations that are currently in incubation within Cloudera Engineering. Its goal is to bring more use cases, productivity, or other types of value to developers by constantly exploring new solutions for their problems. Although the Apache Kafka integration is  not supported nor intended for production use, you may find it interesting for experimentation or personal projects.
Don't have an account?
Coming from Hortonworks? Activate your account here