Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

kafka + how to calculate the value of log.retention.byte

kafka + how to calculate the value of log.retention.byte

One of the major role of log.retention.byte parameter is to avoid full size of the kafka disk , or in other words purging of data logs in order to avoid kafka disk full

According to the following link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_kafka-component-guide/content/kafka-brok...

log.retention.bytes – is The amount of data to retain in the log for each topic partition. By default, log size is unlimited.

We can see also the Note - that this is the limit for each partition, so multiply this value by the number of partitions to calculate the total data retained for the topic.

In order to understanding it well Let’s give little example ( hands-on is always much better)

In kafka machine Under /var/kafka/kafka-logs we have the following topic partitions , while Topic name is - lop.avo.prt.prlop

example of topics partitions under /var/kafka/kafka-logs

lop.avo.prt.prlop-1
lop.avo.prt.prlop-2
lop.avo.prt.prlop-3
lop.avo.prt.prlop-4
lop.avo.prt.prlop-5
lop.avo.prt.prlop-6
lop.avo.prt.prlop-7
lop.avo.prt.prlop-8
lop.avo.prt.prlop-9
lop.avo.prt.prlop-10

and under each partition we have the following logs ( example )

4.0K    00000000000000023657.index
268K    00000000000000023657.log
4.0K    00000000000000023657.timeindex
4.0K    00000000000000023854.index
24K     00000000000000023854.log
4.0K    00000000000000023854.timeindex

In the ambari cluster we have 3 kafka machines ( 3 brokers ) About kafka storage – each kafka include disk with size of 100G

let’s say that we want to purge the logs in the topic when disk comes to 70% from the total disk ,

so now let’s try to calculate the value of log.retention.bytes according to the above info

because we have 10 topic partitions and the we want to limit the total size of the disk to 70G

then my assumption is to do the calculate as the following

each partition will limit to 7G and 7G translating to bytes , so it is  7516192768 bytes

7G X 10 = 70G ( 70% from the total disk )

So seems that log.retention.bytes should set to 7516192768 , in order to limit each partition to 7516192768 bytes

Dose my assumption is logical?

If not then what is the right calculation of - log.retention.bytes ? , based on that kafka disk is 100G , and we have only 10 topic partitions under /var/kafka/kafka-logs

Michael-Bronson
Don't have an account?
Coming from Hortonworks? Activate your account here