Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Explorer

Apache Kafka Network Bandwidth Quotas

Quotas enable cluster servers to impose limits on the volume of data or number of requests served to clients, hence mitigating the effects of DOS [1] attacks and preventing ill behaved clients from becoming a source of DOS towards other clients.

In Apache Kafka there are two types of quotas. Network bandwidth quotas (since v0.9) impose byte rate thresholds and are specified in bytes/s. Request rate quotas are specified as a percentage of CPU utilization.

Managing Quotas

In Kafka quotas are unlimited by default. It is possible to set quota defaults and overrides for tuples (user,client-id), users and clients [2]. Client quotas can be enforced in secure and non-secure clusters using the client property client.id. User and (user,client-id) quotas are suitable for secure environments.

All the clients that belong to the same logical grouping [3] will be subject to the quota for that grouping. Furthermore, the quota will be shared across all the instances belonging to that grouping. For instance, all consumers with client.id=client1 that run as user1 are subject to the quota set for the tuple (user1,client1).

This article discusses in detail network bandwidth quotas, which can be set using the script kafka-configs.sh. Producer and consumer quotas use the config key producer_byte_rate and consumer_byte_rate respectively.

Quota defaults and overrides are written to zookeeper and are read by all brokers immediately. Therefore, quotas are enforced on clients without the need for a restart.

There are four quota operations for each of the aforementioned logical groupings: Set default [4], override, describe, delete.

Client-id Quotas

Client quotas are enforced for all clients belonging to the same logical grouping, which is set specifying the property client.id in producer.properties or consumer.properties. For example, a process is subject to the client quota set for client1 [5] if it has client.id=client1 in its properties file, i.e. it belongs to the client1’s logical grouping.

Set Default

kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type clients --entity-default

Override

kafka-configs.sh  --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type clients --entity-name client1

Describe

kafka-configs.sh  --zookeeper localhost:2181 --describe --entity-type clients [--entity-name client1]

Delete

kafka-configs.sh  --zookeeper localhost:2181 --alter --delete-config 'producer_byte_rate,consumer_byte_rate' --entity-type clients --entity-name client1

User Quotas

The identity of authenticated clients in secure clusters is the user principal. For quotas, the user in a secure context typically refers to the local user as resolved by the principal to local kerberos rule that matches the principal of the authenticated user. This rule is specified using the Kafka broker property sasl.kerberos.principal.to.local.rules [A].

By default, principal names of the form {username}/{hostname}@{REALM} are mapped to {username} [A]. For example, the principal user1/host1@REALM will resolve to local user ‘user1’.

Set Default

kafka-configs.sh  --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type users --entity-default

Override

kafka-configs.sh  --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048’ --entity-type users --entity-name user1

Describe

kafka-configs.sh  --zookeeper localhost:2181 --describe --entity-type users [--entity-name user1]

Delete

kafka-configs.sh  --zookeeper localhost:2181 --alter --delete-config 'producer_byte_rate,consumer_byte_rate’ --entity-type users --entity-name user1

Tuple (user,client-id) quotas

Tuple quotas are the most specific and have the highest priority. Processes that match the tuple quota disregard any quota defaults or overrides set for client and/or user.

Set Default

Default client quotas for user1

kafka-configs.sh  --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048’ --entity-type users --entity-name user1 --entity-type clients --entity-default

Override

kafka-configs.sh  --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-type users --entity-name user1 --entity-type clients --entity-name client1

Describe

kafka-configs.sh  --zookeeper localhost:2181 --describe --entity-type users [--entity-name user1] --entity-type clients [--entity-name client1]

Delete

kafka-configs.sh  --zookeeper localhost:2181 --alter --delete-config 'producer_byte_rate,consumer_byte_rate' --entity-type users --entity-name user1 --entity-type clients --entity-name client1

Quotas Precedence

Quotas specified using the commands listed above create entries in zookeeper as follows:

  • (user,client-id) → /config/users/<user>/clients/<client-id> [6]
  • user → /config/users/<user> [7]
  • client → /config/clients/<client-id> [6]

The paths in zookeeper dictate the following order of precedence [8]:

  1. /config/users/<user>/clients/<client-id>
  2. /config/users/<user>/clients/<default>
  3. /config/users/<user>
  4. /config/users/<default>/clients/<client-id>
  5. /config/users/<default>/clients/<default>
  6. /config/users/<default>
  7. /config/clients/<client-id>
  8. /config/clients/<default>


The highest priority quota is enforced regardless of being larger or smaller. That means that the lowest quota is NOT necessarily the one that is enforced. For example, if client1 has a client quota of 1KB and user1 has a user quota of 1MB, when client1 is run by user1, the quota limit will be 1MB. It is not 1KB because the user quota has higher precedence. On the other hand, client1 would have a quota of 1KB if it is run by user2 when user2 has no quota set (i.e. it has the default, unlimited, quota). To enforce a quota of 1KB for client1 run by user1 you must to set a 1KB quota on the tuple (user1, client1).

Network Bandwidth Quota Internals

Network bandwidth quotas are computed over a sliding window W that is controlled by the broker properties quota.window.num and quota.window.size.seconds. Quotas are enforced using delays, i.e. when a client exceeds its quota it is paused for a time interval such that the throughput over W does not exceed the quota. More formally:

Let

N = quota.window.num - Number of samples to retain in memory for client quotas [9]
T = quota.window.size.seconds - The time span of each sample for client quotas [9] 
d - Imposed quota, or equivalently, the desired average throughput [10] over W [bytes/s] [11]
c - Client actual average throughput over W [bytes/s]
W - Measurement time window spanning N samples [s]
D = Delay [s] - Amount of time that an eager client must pause in order to satisfy the quota requirements 
E - Effective window [s] - Amount of time the client can process data at rate c > d while still satisfying the quota requirement

Assumption

c > d → Client throughput is greater than desired byte rate

By definition

W = quota.window.num * quota.window.size.seconds = N * T [s]
E = W - D  ⇔  D = W - E  … (1)

E <= W

Two consecutive N samples long windows W_k and W_k+1 differ only by one sample, i.e they have N-1overlapping samples

W_k  = w_1  w_2  …  w_n 

W_k+1=      w_2  w_3  …  w_n+1

The goal is to have the average throughput c measured over a window of size W do not exceed d (c <= d). In the limit equality verifies and the total number of bytes processed during the effective window (c*E) must be equal to the quota imposed for that window (d*W), i.e.

d*W=c*E ⇔ E=d*W/c  … (2)

Combining equations (1) and (2)

D = W - d*W/c = W(1 - d/c) = W(c - d)/c  … (3)

The delay D is equal to the percentage of window W by which the current rate c has to be reduced to match the desired rate d

Example:

N = 5 samples

T = 2 [s]

W = 5*2 = 10 [s]

d = 2MB/s = (2*2*5)/(5*2) - Each one of the 5, 2s long, samples has a rate of 2MB/s

c = 4 MB/s = (2*2*4 + 2*12*1)/(5*2) - First 4, 2s long, samples override the previous window and had                                       a rate of 2MB/s. Last, 2s long, sample had a rate of 12MB/s

Replacing equation (3)

D = 10*(4-2)/4 = 5s

If the last sample has a byte rate of 12MB/s, causing the window W average throughput to be 4MB/s, then the client must pause for 5s to satisfy the desired quota

Zookeeper Internals

The commands to set tuple, user, and client quotas create zookeeper node entries with the hierarchy illustrated in section “Quotas Precedence”. An example of the znode contents for default clients, and for tuples (user1,client1) is as follows:

zkCli > get /config/clients/<default>
{"version":1,"config":{"producer_byte_rate":"1024","consumer_byte_rate":"2048"}}
zkCli >  get /config/users/user1/clients/client1
{"version":1,"config":{"producer_byte_rate":"1024","consumer_byte_rate":"2048"}}

Beware that deleting a quota specific znode directly in zkCli using the rmr command does not remove the quota for that grouping. The instances belonging to that grouping are still subject to the quota. This presents a really hard to debug scenario because running ls in zkCli of the parent node shows an empty child tree, hinting that no quota is set.

Never delete quota znodes directly using the zkCli. To remove a quota use the most suiting kafka-configs.sh delete command.

Acknowledgments

Thank you to Arpit Khare and Deepna Bains for their help setting up clusters, discussing and testing multiple quote enforcement scenarios. Thank you to Manikumar Reddy for the clarifying answers around quota windows internals.

References

[A] - https://kafka.apache.org/documentation.html#brokerconfigs

[B] - https://kafka.apache.org/documentation.html#quotas

[C] - https://kafka.apache.org/documentation.html#design_quotas

Footnotes

[1] - Denial of Service

[2] - When referring to client this article implies Kafka consumers or producers

[3] - Processes belong to the same logical grouping if they have same tuple (user,client-id), were started by the same user, or have the same client.id

[4] - Broker properties quota.producer.default and quota.consumer.default will be deprecated and its use is not recommended

[5] - Assuming no overriding user quota. See section "Quotas Precedence"

[6] - If the --entity-default option is used then <client-id>=<default>

[7] - If the --entity-default option is used then <user>=<default>

[8] - Highest → Lowest

[9] - Kafka broker property definition

[10] - Throughput and byte rate are used interchangeably

[11] - Units are enclosed in [ ]

11,170 Views