Created 12-12-2016 05:39 PM
I checked this reference http://kafka.apache.org/documentation.html#java and see that LinkedIn recommends GC1. I'd like to understand more about this decision and what settings are relevant to tune for throughput/performance.
Created 12-12-2016 05:48 PM
I had this question myself a few months ago. I did a little bit of researched and learned the following:
- Garbage First (or G1) has been introduced with Java7 and it is designed to automatically adjust to different workloads and provide consistent pause times for GC over the lifetime of the application; it also handles large heap sizes with ease, by segmenting the heap into smaller zones and not collecting over the entire heap in each pause.
- There are two configuration options for G1 that are relevant for performance:
a) MaxGCPauseMillis - preferred pause for each garbage collection cycle. Default value is 200 milliseconds, however, it is not a fixed value, G1 can exceed it. G1 will attempt to schedule the frequency of GC cycles as well number of zones that are collected in each cycle such that each cycle will take approximately 200 ms.
B) InitiatingHeapOccupancyPercent - specified the% pf the total heap that may be in use before G1 starts a new collection cycle. Default is 45%. This includes both, eden and old zone usage in total.
Last time I checked, Kafka start script does not use G1 collector, defaulting to New and COncurrent Mark and Sweek GC. You may need to make the change manually via environment variable KAFKA_JVM_PERFORMANMCE_OPTS.
***
If this was helpful, please vote/accept answer.
Created 12-12-2016 05:48 PM
I had this question myself a few months ago. I did a little bit of researched and learned the following:
- Garbage First (or G1) has been introduced with Java7 and it is designed to automatically adjust to different workloads and provide consistent pause times for GC over the lifetime of the application; it also handles large heap sizes with ease, by segmenting the heap into smaller zones and not collecting over the entire heap in each pause.
- There are two configuration options for G1 that are relevant for performance:
a) MaxGCPauseMillis - preferred pause for each garbage collection cycle. Default value is 200 milliseconds, however, it is not a fixed value, G1 can exceed it. G1 will attempt to schedule the frequency of GC cycles as well number of zones that are collected in each cycle such that each cycle will take approximately 200 ms.
B) InitiatingHeapOccupancyPercent - specified the% pf the total heap that may be in use before G1 starts a new collection cycle. Default is 45%. This includes both, eden and old zone usage in total.
Last time I checked, Kafka start script does not use G1 collector, defaulting to New and COncurrent Mark and Sweek GC. You may need to make the change manually via environment variable KAFKA_JVM_PERFORMANMCE_OPTS.
***
If this was helpful, please vote/accept answer.
Created 12-12-2016 05:53 PM
Thanks @Constantin Stanca. That was very helpful. I believe you have a typo on KAFKA_JVM_PERFORMANCE_OPTS environment name.
Created 12-29-2016 07:07 PM
To summarize, G1GC provides predictable GC times which is critical for real-time applications (like Kafka, Storm, Solr etc.). The reasoning is to avoid stop the world garbage collection which will result in back pressure in heavy ingest environments.