Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NodeManager Health is bad: Issue due to garbage collection

avatar
Champion

We are getting the following error from YARN: NodeManager Health is bad: GC Duration:
Average time spent in garbage collection was 45.2 second(s) (75.40%) per minute over the previous 5 minute(s). Critical threshold: 60.00%.
Average time spent in garbage collection was 30.3 second(s) (50.45%) per minute over the previous 5 minute(s). Warning threshold: 30.00%.

 

Below are my configuration: 

 

Currently we are using the default setting for CM -> Yarn -> Configuration -> Java Configuration Options for Node Manager

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled

 

CM -> Yarn -> Configuration -> nodemanager_gc_duration_window

 

5 minute(s)

 

CM -> Yarn -> Configuration -> nodemanager_gc_duration_thresholds

Warning: 30.0
Critical: 60.0

 

I went through this link but it doesn't cover how to fix this issue

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ht_nodemanager.html

 

Below are my questions :

1. The environment was good for more than a year but getting issue now. why? Is it due to more usage?

2. Do we need to clear any old garbage from the environment to fix this issue? if so, how?

3. Do we need to change any configuration to fix this issue? if so, how?

4. Do we need to do both step 2 and step 3 by any chance?

1 REPLY 1

avatar
Cloudera Employee

@saranvisaThis health check result indicates that NodeManager is not getting enough heap space compared to its workload. Typically when workload grows in the cluster and thus the java daemon needs more heap, you need to give more heap to the Role.

 

You could:

1. Increase the heap given to Node Manager through Node Manager's configuration page.('Java Heap Size of NodeManager in Bytes')

2. Alternatively, though not recommended, you could tune the threshold you found to tolerate higher GC ratio for Node Manager.

 

I would recommend you go to the specific Node Manager's role instance page in Cloudera Manager, browse through the charts available for Node Manager, there would be a chart named 'JVM heap memory usage' telling you the heap consumption of the particular Node Manager. Then you can have a better idea of how much memory the Role is using and potentially increase the heap given to it to a higher value.