We want to suggest the following based on our issues on kafka disks
We have many HDP clusters ( based on ambari , and all machines are redhat version 7.2 )
Each cluster include 3 kafka machines , while each kafka include disk with ~15 T
Because we have many issues that disk increased to 100% used capacity ( kafka Retention from some reason not works as should be )
Then we think about cron job that will run on kafka machines every min
And when kafka disk size will be for example - ~90%
then cron job will stop all kafka brokers ( kafka service )
And by this we avoid the kafka disk to became 100% , ( as all know when disk is 100% then the purging process will not works )
Please share your opinion
Hi @Michael Bronson ,
any insights into why you are thinking "retention does not work as it should" ?
It would be also helpful if you could provide some more details about the usage of your Kafka Cluster. Is data flodding in steadily, are there heavy spikes which lead to _partition full_, how many producers in parallel, how many topics + replication, etc.
How did you configure the retention?
here are the details
kafka retention hours - 7 days
kafka retention bytes - 130G ( I convert it to 130G )