Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Uneven allocation of containers causing high load avg

Explorer

Hello HCC,

We recently upgraded our prod and all dev cluster from HDP 2.5.3.0 to HDP 2.6.1.0, as we are observing weird behavior in HDP 2.6.1.0 some of the nodes are getting very high allocation of containers causing very high avg load on the server and that is causing nodes to go in heart beat lost state.

When the nodes got very high avg load NN making those nodes as Dead nodes where as RM still keep on assigning containers ( we know that both RM and NN work independently) on that node and all those containers are causing jobs to go in failed state. Every time when we having this issue we are asking our SA team to reboot those servers to alleviate the issue, we didn't had this behavior with HDP 2.5.3.0.

Please find the screenshot for reference where nodes got very high no.of containers and load avg

Present versions : HDP 2.6.1.0 and Ambari 2.5.2.0

@Kuldeep Kulkarni @Jay SenSharma @Artem Ervits @ssathish

ss-1.pngss-2.pngss-3.pngss-4.png


ss-1.png
1 REPLY 1

Explorer

Log messages from one of the server where we are observing this behavior

2017-10-27 21:20:43,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 10059 for container-id container_e746_1508665985104_313505_01_002805: -1B of 4 GB physical memory used; -1B of 8.4 GB virtual memory used
2017-10-27 21:20:44,049 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 108356 for container-id container_e746_1508665985104_313505_01_002168: 1.2 MB of 4 GB physical memory used; 103.6 MB of 8.4 GB virtual memory used
2017-10-27 21:20:44,105 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 13033 for container-id container_e746_1508665985104_304789_01_002499: -1B of 4 GB physical memory used; -1B of 8.4 GB virtual memory used
~


Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.