Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NodeManagers crashing often - oom

NodeManagers crashing often - oom

Explorer

Hello Everyone,

 

In my dev clusters, there are 2 NodeManagers. It is crashing often for the past few weeks because of memory issues and tried to increase heap size for Node Manager process as temporary workaround (from 512 MB to 6 GB as of now). Before 2 days, it couldn't even able to start with 4 GB after crash and it worked after increasing it to 6 GB. Below graph took from CM showing the heap usage across node (jvm_heap_used_mb_across_nodemanagers metric). 

 

Can you help me in this regard?

 

Thanks,

Mani

 

jvm_heap_size.png

 

4 REPLIES 4

Re: NodeManagers crashing often - oom

Explorer

Also, I am seeing correlation with GC process as well. Below graph shows the same (jvm_gc_time_ms_rate_across_nodemanagers)

 

jvm_gc_time_ms_rate_across_nodemanagers.png

Highlighted

Re: NodeManagers crashing often - oom

Explorer

Few more observations..

 

While analysing heap dump for killed jvm of NodeManager process, come to know that DeletionService.java (Hash Map) is taking huge amount of memory for some reasons.  Can you look into this?

 

Thanks,

Mani

Re: NodeManagers crashing often - oom

New Contributor

Hi Mani,

 

I think I'm also running into this problem. I found my NodeManagers were occasionally being sent SIGKILL from Cloudera's killparent.sh script which is run when NM receives an OutOfMemoryException. In Cloudera Manager, I don't see JVM memory usage trending up, so it's a bit of a mystery why it suddenly receives OOM when a second before, it was well below the limit.

 

Anyway, please share if you find anything... I will as well!

Re: NodeManagers crashing often - oom

Explorer
Hi,

In my case, yarn nodemanager debug delay sec has been configured with very
high number (100+ days). Hence, lot of tasks has been scheduled for
deletion (actually deletion would have happen after above said days). Till
that time, all those tasks info would be there, not cleared. Hence,
consuming lot of memory.

However, jvm heap size of nodemanager metric shows the usage pattern.

Thanks,
Mani