Hi, I am stuck with this issue for quite sometime now; We have a cluster (with spot instances as workers with auto scaling groups) and cluster is managed by Yarn. We are using fair schedulers on the same. Often we need to add more resources to particular queues based upon the SLA's we have( we are maintaining them in our rep, from AWS metrics, to Yarn Metrics etc). Then we have SLA's which is often time bounded but sometimes metric dependent as well. For Yarn , we are using Fair Scheduling currently. The need is that for a particular SLA, (with a given allowed deviation), if we find that with current resource allocation to this queue we would not be able to meet our SLA( based on application progress etc), we add more instances(resources/containers) to cluster. But what we want to control is that these new resources that are added to cluster are taken up by a specific queue only to speed up the applications in that queue and we meet our SLAs.
Would be really helpful if someone can help us with the solution to do this.
In HDP - Yarn uses the Capacity Scheduler, and within each Queue, you can specify a Order = Fair Scheduling. Is this what you mean by you are using the fair scheduler? In Ambari, what do you have Yarn > Configs > Advanced > Scheduler > yarn.resourcemanager.scheduler.class set to?
I would recommend using Pre-emption, do you have this enabled?
The default scheduler is org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler - why did you change this to Fair Scheduler?