This is from a small development environment, so the heap is much smaller (1 GB) than what you’ll see in a real production cluster.
In the past, we’ve shared with you what to use for GC settings, but have we ever explained why?
GC Tuning Rationale
Let’s address each setting individually and explain why we recommend these settings.
This flag enables the Concurrent Mark Sweep garbage collector. Why do we use this garbage collector?
The JVM offers multiple different garbage collector implementations. Each one makes different engineering trade-offs to better serve different kinds of applications. Literature on garbage collection discusses a trade-off between responsiveness (how quickly an application responds) vs. throughput (maximizing the total amount of work done by an application in a period of time).
Quoting Oracle’s garbage collection tuning guide:
The concurrent collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications which have a relatively large set of long-lived data (a large tenured generation), and run on machines with two or more processors tend to benefit from the use of this collector.
The bold emphasis is mine. These characteristics probably sound familiar for our Hadoop daemons. Taking the NameNode as an example:
We prefer shorter pauses, because we don’t want a client’s RPC call blocked waiting for a garbage collection, which would harm latency.
It’s fine to share CPU resources between the garbage collector and our application logic. Realistic Hadoop deployments run on multi-core/multi-proc machines where the full resources are dedicated to running Hadoop and no other applications.
We have a large set of long-lived data. Every file is represented in memory as an INode object, and it’s alive for the duration of the process, unless someone deletes the file.
CMS suits our usage patterns well.
Is there anything wrong with CMS? If poorly configured, then fragmentation can accumulate in the tenured generation over time. CMS collections do not perform periodic compaction (rearranging existing object allocations into contiguous space) to fix the fragmentation. In the degenerate case, CMS is forced to “stop the world” to do a compaction. This is not a concurrent operation. We do not expect this to happen with our current recommendations for GC configuration.
Personal horror story: I once worked on an application (not Hadoop) with a unique memory allocation pattern that drove CMS into a 2-hour stop-the-world collection and compaction. I’ve been meaning to contact Guinness to find out if I’ve set the world record for longest GC pause.
Sets the number of threads to use for garbage collection (the “concurrency” in “Concurrent Mark Sweep”). Running jstack on the JVM process will show this number of threads named like:
"Gang worker#0 (Parallel GC Threads)"
We expect tuning this correctly to shorten the amount of time required to complete a garbage collection. In practice, we’ve seen that a value of 8 works well on typical hardware.
By default, CMS tracks internal heuristics of the application’s usage pattern to determine when to trigger a garbage collection. Typically, this tends to delay collection until the old generation is nearly full. In the worst case, this can degrade to a stop-the-world full garbage collection. By setting these properties, we tell CMS to initiate garbage collection based solely on fraction of heap usage instead of its internal heuristics. The fractional value must be chosen carefully. If it is too large, then it still might not prevent costly full garbage collections. If it is too small, then garbage collection will run too frequently, reducing the application’s overall throughput. In practice, we have found that a value of 70 works well, though it’s possible that individual deployments might need tuning.
-Xms sets the minimum heap size, and -Xmx sets the maximum heap size. When a JVM process starts, it initially allocates the minimum heap size. If the need for memory arises at runtime, it can go back to the OS and allocate more, up to the maximum heap size. Why do we recommend setting these to the same value? Those incremental memory allocations can cause unpredictable runtime performance. Additionally, starting with a low minimum heap size could cause instability as the need for a larger heap grows over time. If it needs to grow to the maximum, but the OS doesn’t actually have that amount of memory available to hand out, then the process could be subject to OutOfMemoryError conditions, or possibly even the OOM killer stopping the whole process on Linux. Setting minimum and maximum to the same value eliminates these potential sources of instability.
These control the minimum and maximum size of the new generation in the generational garbage collector architecture. We’ve found that for large heaps, the JVM defaults are too small for this. Our current recommendation is to set this to 1/8 to 1/6 of the total heap size. We recommend setting each of these to the same value for stability reasons, similar to the above discussion of total heap size.
This controls the minimum and maximum size of the permanent generation, which is a portion of the heap dedicated to application metadata, such as Java class definitions and interned string variables. If this is too small, then you’ll see errors like:
java.lang.OutOfMemoryError: PermGen space
These settings are our current recommendation for what works well for our usage patterns.
Additionally, some of the settings are helpful to assist with troubleshooting.
This enables verbose garbage collection logging. For example:
Which garbage collector ran (i.e. ParNew on the new generation vs. CMS for the full GC).
<used heap size before>-><used heap size after>.
Timing information. Generally “real” is the most useful metric, because it’s actual clock time.
Possible warning signs that heap or GC configuration is incorrect:
Very frequent garbage collections. In particular, we expect full GC to be rare.
Individual garbage collections take a long “real” time.
Full GC happens frequently and does not significantly reduce the amount of used heap. This is often accompanied by OutOfMemoryError in the main logs. This likely means that the process needs to be allocated a larger heap. For example, the NameNode is trying to store more inodes than will fit into its current heap size.
If the JVM process crashes, logs a crash report to this path. This includes useful information like a list of the threads that were running at the time, and possibly a native code backtrace. If not specified, the default is hs_err_pid<pid>.log in the process working directory.
Dumps the state of the heap to a file when OutOfMemoryError is thrown. This can be useful for post-mortem analysis to see exactly what kinds of objects were consuming a lot of the heap.
If specified, then this is the path to the file where the heap dump will be written. The default is java_pid<pid>.hprof in the process working directory.
JConsole is a general-purpose monitoring tool for displaying JMX metrics. It offers a few helpful things for watching garbage collection, particularly if a currently running process forgot to enable GC logging, and you want to get a glimpse of things without restarting the process to turn on GC logging.
The Memory tab charts information on memory usage, and can break it down by the different GC pools. There is also a “Perform GC” button here. That will force a stop-the-world full GC, so don’t click it unless you really mean it.
The VM Summary tab has a section that states total amount of time spent in each kind of garbage collector. Keep in mind that this refers to the whole lifetime of the JVM process, so the time spent is ever-growing, and you’ll see larger numbers for longer-lived processes.
JVisualVM is a Java Virtual Machine Monitoring, Troubleshooting, and Profiling Tool. It offers multiple plugins, including the Visual GC plugin. This lets you interactively watch activity move across the various generations of the garbage collector.
Future Directions: The G1 Garbage Collector
The G1 Garbage Collector is a revamped garbage collection architecture that aims to replace CMS. I do not know of anyone who has done extensive testing of Hadoop with G1. We do not test and certify with G1. However, we would like to investigate G1 further in the future and perform testing. One of the most significant benefits over CMS is that it is a compacting collector. (See personal horror story earlier.)
Quoting the documentation:
The Garbage-First (G1) collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is designed for applications that:
Can operate concurrently with applications threads like the CMS collector.
Compact free space without lengthy GC induced pause times.
Need more predictable GC pause durations.
Do not want to sacrifice a lot of throughput performance.
Do not require a much larger Java heap.
G1 is planned as the long term replacement for the Concurrent Mark-Sweep Collector (CMS). Comparing G1 with CMS, there are differences that make G1 a better solution. One difference is that G1 is a compacting collector. G1 compacts sufficiently to completely avoid the use of fine-grained free lists for allocation, and instead relies on regions. This considerably simplifies parts of the collector, and mostly eliminates potential fragmentation issues. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets.
Personal horror story part 2: After the 2-hour GC pause I discussed earlier, we investigated use of G1. It was considered beta at the time. It was designed specifically to address our kind of allocation pattern. What we saw at the time was that G1 out-performed every other garbage collection configuration we tried, including CMS. However, it also caused the JVM process to segfault approximately every 2 days. Based on stability concerns, we couldn’t put this configuration in production. That was several years ago, so perhaps G1 has stabilized by now.