Issue description: I am seeing very high load average on 10-15 nodes where highest no.of containers are running (45 containers running lead to cause high load avg of about 300 to 500)
Background: I am having HDP-2.6 cluster managed by Ambari having 58 datanodes & nodemanagers with 48 cores & 504 GB RAM.
I noticed that, load avg is always high (varies between 300 to 500) on particular 10-15 nodes where 45 containers will be running, whereas other nodes will be running 10-20 containers with load avg 3 to 50.
When we stop those 10-15 nodemanagers, the load avg goes high for next random 10-15 nodes.
Not sure why the containers are unevenly getting distributed over the nodemanagers. The high cpu load avg on those machine causes the slowness in executing the tasks submitted on those nodes leading to slower execution of jobs. Any help on this is highly appreciated !