Support Questions

mike_bronson7 · ‎10-03-2017

in our ambari cluster ( version 2.6 )

we have masters machines and workers machines

while kafka installed on the master machines

the partition /data is only 15G and kafka log folder is - /data/vars/kafka/kafka-logs

most of the folders under /data/vars/kafka/kafka-logs are with size 4K-40K

but two folders are very huge size - 5G-7G , and this cause /data to be 100%

example:

under /data/vars/kafka/kafka-logs/mmno.aso.prpl.proces-90 folder:

-rw-r--r-- 1 kafka hadoop 1073037840 Oct 2 14:05 00000000000000000000.log

-rw-r--r-- 1 kafka hadoop 9480 Oct 2 14:05 00000000000000000000.index

-rw-r--r-- 1 kafka hadoop 13596 Oct 2 14:05 00000000000000000000.timeindex

-rw-r--r-- 1 kafka hadoop 1073464387 Oct 2 14:45 00000000000001419960.log

-rw-r--r-- 1 kafka hadoop 9632 Oct 2 14:45 00000000000001419960.index

-rw-r--r-- 1 kafka hadoop 14412 Oct 2 14:45 00000000000001419960.timeindex

-rw-r--r-- 1 kafka hadoop 1073132221 Oct 2 15:23 00000000000002840641.log

du -sh *

12K 00000000000000000000.index

1.0G 00000000000000000000.log <---

is it possible to limit the size of the logs? or other solution ? ( some variables from the ambari GUI that need to add or to reconfigure ? )

kafka have small /data folder and logs should not be with 1G size , how to solve it?

Michael-Bronson

bkosaraju · ‎10-03-2017

Hi @uri ben-ari,

Yes, thats possible provided following :

Partitions :

Increase the partitions will keep the data in different log files which also gives the benefit of increased parallelism along with reducing the log file size(increases the number of files)

at the same this will not reduce the data volume in disk, but split into multiple files.

Prcedure to increase the partition :

# Add new option or change exsiting option 

bin/kafka-configs.sh --alter --zookeeper <Zookeeper_server>:2181 --entity-name <topicName> --entity-type topics --add-config cleanup.policy=compact

and then ensure that Partition reassignment script executed with --execute option

bin/kafka-reassign-partitions.sh

more on these utilities with syntax and examples can be found here

Data Retention :

If you don't need to hold the data, that can be purged after reached the retention

this can be set while the topic creation time with the --config option retention.bytes or retention.ms

#Example bin/kafka-configs.sh --zookeeper <zookeeper_server>:2181 --entity-type topics --alter --add-config retention.ms=86400000 --entity-name <topic_name>

Hope this helps!!

View solution in original post

asirna · ‎10-03-2017

Hi @uri ben-ari,

Check if you are running Kafka with DEBUG mode. It can generate tons of logs.

You can modify these settings under Kafka -> Configs -> Advanced Kafka-log4j

set log4j.rootLogger=INFO, stdout

Additionally check

Kafka Controller Log: # of backup files, Kafka Controller Log: # of backup file size, Kafka Log: # of backup files, Kafka Log: # of backup file size

Thanks,

Aditya

mike_bronson7 · ‎10-03-2017

log4j.rootLogger=INFO, stdout ( already set ) in my ambari cluster , what chuld be else ?

Michael-Bronson

mike_bronson7 · ‎10-03-2017

Kafka Controller Log: backup and Kafka Log: backup file - size is 256M

Michael-Bronson

mike_bronson7 · ‎10-03-2017

Kafka Controller Log: # of backup files - 20

Michael-Bronson

mike_bronson7 · ‎10-03-2017

Kafka Log: # of backup files - 20

Michael-Bronson

bkosaraju · ‎10-03-2017

Hi @uri ben-ari,

Yes, thats possible provided following :

Partitions :

Increase the partitions will keep the data in different log files which also gives the benefit of increased parallelism along with reducing the log file size(increases the number of files)

at the same this will not reduce the data volume in disk, but split into multiple files.

Prcedure to increase the partition :

# Add new option or change exsiting option 

bin/kafka-configs.sh --alter --zookeeper <Zookeeper_server>:2181 --entity-name <topicName> --entity-type topics --add-config cleanup.policy=compact

and then ensure that Partition reassignment script executed with --execute option

bin/kafka-reassign-partitions.sh

more on these utilities with syntax and examples can be found here

Data Retention :

If you don't need to hold the data, that can be purged after reached the retention

this can be set while the topic creation time with the --config option retention.bytes or retention.ms

#Example bin/kafka-configs.sh --zookeeper <zookeeper_server>:2181 --entity-type topics --alter --add-config retention.ms=86400000 --entity-name <topic_name>

Hope this helps!!

mike_bronson7 · ‎10-03-2017

how to find the - Zookeeper_server value and the topicName value?

Michael-Bronson

bkosaraju · ‎10-03-2017

Hi @uri ben-ari,

zookeeper name can be found from ambari (can any of the zookeeper server )

Kafka topic name is the directory name without the partition index (after Kafka-logs ex : mmno.aso.prpl.proces )

on the other note : logs are Kafka messages, not the application logs hence please look for the option to reduce the retention of the topic so that will purge some of the un-used messages from topic.

Cloudera Community

Support Questions

kafka logs + how to limit the logs size