Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Guidelines for initial garbage collection settings in HDP?

avatar

After studying the basics on Java GC, it seems like the Serial (default) GC would be best for YARN containers (low core:task ratio), and CMS or G1 would be best for long-running services that occupy more memory (master services and some edge servers). Are these assumptions valid?

What is recommended for worker services? Is there any situation in the HDP ecosystem where it's recommended to start with ParallelGC or ParallelOldGC?

I still hear of people using CMS, but it looks like that is replaced in favor of G1 as of Java 7+. Is there any reason to choose CMS over G1 when the latter is available?

Are there additional garbage collectors worth learning about, beyond: Serial, Parallel, ParallelOld, CMS, and G1?

1 ACCEPTED SOLUTION

avatar

I'm most familiar with GC tuning for HDFS, so I'll answer from that perspective.

As you expected, our recommendation for the HDFS daemons is CMS. In practice, we have found that some of the default settings for CMS are sub-optimal for the NameNode's heap usage pattern. In addition to enabling CMS, we recommend tuning a few of those settings.

I agree that G1 would be good to evaluate as the future direction. As of right now, we have not tested and certified with G1, so I can't recommend using it.

For more details, please refer to the NameNode garbage collection deep dive article that I just posted.

https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra....

View solution in original post

4 REPLIES 4

avatar

@Alex Miller did you ever get an answer to this question outside the forum?

avatar

I discovered an internal doc by @Chris Nauroth that provides best practices and troubleshooting tips. Perhaps he would like to share it as a KB when time permits.

avatar

@Alex Miller, that's a great idea. I've just imported that doc as a new article here: https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra.... .

avatar

I'm most familiar with GC tuning for HDFS, so I'll answer from that perspective.

As you expected, our recommendation for the HDFS daemons is CMS. In practice, we have found that some of the default settings for CMS are sub-optimal for the NameNode's heap usage pattern. In addition to enabling CMS, we recommend tuning a few of those settings.

I agree that G1 would be good to evaluate as the future direction. As of right now, we have not tested and certified with G1, so I can't recommend using it.

For more details, please refer to the NameNode garbage collection deep dive article that I just posted.

https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra....