Created 10-03-2017 06:46 AM
in our ambari cluster ( version 2.6 )
we have masters machines and workers machines
while kafka installed on the master machines
the partition /data is only 15G and kafka log folder is - /data/vars/kafka/kafka-logs
most of the folders under /data/vars/kafka/kafka-logs are with size 4K-40K
but two folders are very huge size - 5G-7G , and this cause /data to be 100%
example:
under /data/vars/kafka/kafka-logs/mmno.aso.prpl.proces-90 folder:
-rw-r--r-- 1 kafka hadoop 1073037840 Oct 2 14:05 00000000000000000000.log
-rw-r--r-- 1 kafka hadoop 9480 Oct 2 14:05 00000000000000000000.index
-rw-r--r-- 1 kafka hadoop 13596 Oct 2 14:05 00000000000000000000.timeindex
-rw-r--r-- 1 kafka hadoop 1073464387 Oct 2 14:45 00000000000001419960.log
-rw-r--r-- 1 kafka hadoop 9632 Oct 2 14:45 00000000000001419960.index
-rw-r--r-- 1 kafka hadoop 14412 Oct 2 14:45 00000000000001419960.timeindex
-rw-r--r-- 1 kafka hadoop 1073132221 Oct 2 15:23 00000000000002840641.log
du -sh *
12K 00000000000000000000.index
1.0G 00000000000000000000.log <---
is it possible to limit the size of the logs? or other solution ? ( some variables from the ambari GUI that need to add or to reconfigure ? )
kafka have small /data folder and logs should not be with 1G size , how to solve it?
Created 10-03-2017 07:53 AM
Hi @uri ben-ari,
Yes, thats possible provided following :
Partitions :
Increase the partitions will keep the data in different log files which also gives the benefit of increased parallelism along with reducing the log file size(increases the number of files)
at the same this will not reduce the data volume in disk, but split into multiple files.
Prcedure to increase the partition :
# Add new option or change exsiting option bin/kafka-configs.sh --alter --zookeeper <Zookeeper_server>:2181 --entity-name <topicName> --entity-type topics --add-config cleanup.policy=compact
and then ensure that Partition reassignment script executed with --execute option
bin/kafka-reassign-partitions.sh
more on these utilities with syntax and examples can be found here
Data Retention :
If you don't need to hold the data, that can be purged after reached the retention
this can be set while the topic creation time with the --config option retention.bytes or retention.ms
#Example bin/kafka-configs.sh --zookeeper <zookeeper_server>:2181 --entity-type topics --alter --add-config retention.ms=86400000 --entity-name <topic_name>
Hope this helps!!
Created 10-03-2017 07:34 AM
Hi @uri ben-ari,
Check if you are running Kafka with DEBUG mode. It can generate tons of logs.
You can modify these settings under Kafka -> Configs -> Advanced Kafka-log4j
set log4j.rootLogger=INFO, stdout
Additionally check
Kafka Controller Log: # of backup files, Kafka Controller Log: # of backup file size, Kafka Log: # of backup files, Kafka Log: # of backup file size
Thanks,
Aditya
Created 10-03-2017 07:42 AM
log4j.rootLogger=INFO, stdout ( already set ) in my ambari cluster , what chuld be else ?
Created 10-03-2017 07:45 AM
Kafka Controller Log: backup and Kafka Log: backup file - size is 256M
Created 10-03-2017 07:45 AM
Kafka Controller Log: # of backup files - 20
Created 10-03-2017 07:46 AM
Kafka Log: # of backup files - 20
Created 10-03-2017 07:53 AM
Hi @uri ben-ari,
Yes, thats possible provided following :
Partitions :
Increase the partitions will keep the data in different log files which also gives the benefit of increased parallelism along with reducing the log file size(increases the number of files)
at the same this will not reduce the data volume in disk, but split into multiple files.
Prcedure to increase the partition :
# Add new option or change exsiting option bin/kafka-configs.sh --alter --zookeeper <Zookeeper_server>:2181 --entity-name <topicName> --entity-type topics --add-config cleanup.policy=compact
and then ensure that Partition reassignment script executed with --execute option
bin/kafka-reassign-partitions.sh
more on these utilities with syntax and examples can be found here
Data Retention :
If you don't need to hold the data, that can be purged after reached the retention
this can be set while the topic creation time with the --config option retention.bytes or retention.ms
#Example bin/kafka-configs.sh --zookeeper <zookeeper_server>:2181 --entity-type topics --alter --add-config retention.ms=86400000 --entity-name <topic_name>
Hope this helps!!
Created 10-03-2017 08:09 AM
how to find the - Zookeeper_server value and the topicName value?
Created 10-03-2017 08:28 AM
Hi @uri ben-ari,
zookeeper name can be found from ambari (can any of the zookeeper server )
Kafka topic name is the directory name without the partition index (after Kafka-logs ex : mmno.aso.prpl.proces )
on the other note : logs are Kafka messages, not the application logs hence please look for the option to reduce the retention of the topic so that will purge some of the un-used messages from topic.