When we run any Yarn job, one of the node managers is over allocated by the RM. More number of containers are being scheduled and launched by the RM in that nodemanager. This is impacting our jobs SLA. When we stopped the nodemanager service on that machine, and when I re ran the job, the containers are distributed properly.
Could any one help me on this. We are using HDP2.4 version. We are not using 'Fair policy' and not using Preemption as well.issue1.jpgissue2.jpg
I suppose your standard container size is about 4Gb. Unless you are using cgroups, yarn only allocates based on memory settings, in your scenario 119 containers for 476Gb available is 4G per container. If you want fine grained control on cpu scheduling you will need to configure Yarn to use cgroups.