Community Articles

Find and share helpful community-sourced technical articles.
avatar
Super Guru

Overview

The following versions of Apache Kafka have been incorporated in HDP 2.2.0 to 2.6.1: 0.8.1, 0.8.2, 0.9.0, 0.10.0, 0.10.1. Apache Kafka is now at 0.11.

Hortonworks is working to make Kafka easier for enterprises to use. New focus areas include creation of a Kafka Admin Panel to create/delete topics and manage user permissions, easier and safer distribution of security tokens and support for multiple ways of publishing/consuming data via a Kafka REST server/API. Here are a few areas of strong contribution:

Operations:

  • Rack awareness for Increased resilience and availability such that replicas are isolated so they are guaranteed to span multiple racks or availability zones.
  • Automated replica leader election for automated, even distribution of leaders in a cluster capability by detecting uneven distribution with some brokers serving more data compared to others and makes adjustments.
  • Message Timestamps so every message in Kafka now has a timestamp field that indicates the time at which the message was produced.
  • SASL improvements including external authentication servers and support of multiple types of SASL authentication on one server
  • Ambari Views for visualization of Kafka operational metrics

Security:

  • Kafka security encompasses multiple needs – the need to encrypt the data flowing through Kafka and preventing rogue agents from publishing data to Kafka, as well as the ability to manage access to specific topics on an individual or group level.
  • As a result, latest updates in Kafka support wire encryption via SSL, Kerberos based authentication and granular authorization options via Apache Ranger or other pluggable authorization system.

This article lists below new features beyond Hortonworks contribution. At the high level, the following have been added by the overall community.

  • Kafka Streams API
  • Kafka Connect API
  • New unified Consumer API
  • Transport encryption using TLS/SSL
  • Kerberos/SASL Authentication support
  • Access Control Lists
  • Timestamps on messages
  • Reduced client dependence on zookeeper (offsets stored in Kafka topic)
  • Client interceptors

New Features Since HDP 2.2

Here is the list of NEW FEATURES as they have been included in the release notes.

Kafka 0.8.1: https://archive.apache.org/dist/kafka/0.8.1/RELEASE_NOTES.html

  • [KAFKA-330] - Add delete topic support
  • [KAFKA-554] - Move all per-topic configuration into ZK and add to the CreateTopicCommand
  • [KAFKA-615] - Avoid fsync on log segment roll
  • [KAFKA-657] - Add an API to commit offsets
  • [KAFKA-925] - Add optional partition key override in producer
  • [KAFKA-1092] - Add server config parameter to separate bind address and ZK hostname
  • [KAFKA-1117] - tool for checking the consistency among replicas

Kafka 0.8.2: https://archive.apache.org/dist/kafka/0.8.2.0/RELEASE_NOTES.html

  • [KAFKA-1000] - Inbuilt consumer offset management feature for kakfa
  • [KAFKA-1227] - Code dump of new producer
  • [KAFKA-1384] - Log Broker state
  • [KAFKA-1443] - Add delete topic to topic commands and update DeleteTopicCommand
  • [KAFKA-1512] - Limit the maximum number of connections per ip address
  • [KAFKA-1597] - New metrics: ResponseQueueSize and BeingSentResponses
  • [KAFKA-1784] - Implement a ConsumerOffsetClient library

Kafka 0.9.0: https://archive.apache.org/dist/kafka/0.9.0.0/RELEASE_NOTES.html

  • [KAFKA-1499] - Broker-side compression configuration
  • [KAFKA-1785] - Consumer offset checker should show the offset manager and offsets partition
  • [KAFKA-2120] - Add a request timeout to NetworkClient
  • [KAFKA-2187] - Introduce merge-kafka-pr.py script

Kafka 0.10.0: https://archive.apache.org/dist/kafka/0.10.0.0/RELEASE_NOTES.html

  • [KAFKA-2832] - support exclude.internal.topics in new consumer
  • [KAFKA-3046] - add ByteBuffer Serializer&Deserializer
  • [KAFKA-3490] - Multiple version support for ducktape performance tests

Kafka 0.10.0.1: https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html

  • [KAFKA-3538] - Abstract the creation/retrieval of Producer for stream sinks for unit testing

Kafka 0.10.1: https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html

  • [KAFKA-1464] - Add a throttling option to the Kafka replication tool
  • [KAFKA-3176] - Allow console consumer to consume from particular partitions when new consumer is used.
  • [KAFKA-3492] - support quota based on authenticated user name
  • [KAFKA-3776] - Unify store and downstream caching in streams
  • [KAFKA-3858] - Add functions to print stream topologies
  • [KAFKA-3909] - Queryable state for Kafka Streams
  • [KAFKA-4015] - Change cleanup.policy config to accept a list of valid policies
  • [KAFKA-4093] - Cluster id

Final Notes

Apache Kafka shines in use cases like:

  • replacement for a more traditional message broker
  • user activity tracking pipeline as a set of real-time publish-subscribe feeds (the original use case)
  • operational monitoring data
  • log aggregation
  • stream processing
  • event sourcing
  • commit log

Apache Kafka continues to be a dynamic and extremely popular project with more and more adoption.

1,673 Views