Created 05-30-2016 03:05 PM
Can someone explain the mechanism behind the property "yarn.scheduler.capacity.node-locality-delay" ?
from apache site i see -
Property | Description |
---|---|
yarn.scheduler.capacity.node-locality-delay | Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically, this should be set to number of nodes in the cluster. By default is setting approximately number of nodes in one rack which is 40. Positive integer value is expected. |
Created 05-31-2016 06:05 AM
@Sagar Shimpi this is nice post explaining how this property used by RMContainerAllocator while allocating a container upon request.
http://johnjianfang.blogspot.in/2014/08/delay-scheduling-in-capacity-scheduling.html
Created 05-30-2016 04:51 PM
This is what I got from
Hadoop: The Definitive Guide, 4th Edition
All the YARN schedulers try to honor locality requests. On a busy cluster, if an application requests a particular node, there is a good chance that other containers are running on it at the time of the request. The obvious course of action is to immediately loosen the locality requirement and allocate a container on the same rack. However, it has been observed in practice that waiting a short time (no more than a few seconds) can dramatically increase the chances of being allocated a container on the requested node, and therefore increase the efficiency of the cluster. This feature is called delay scheduling, and it is supported by both the Capacity Scheduler and the Fair Scheduler.
Every node manager in a YARN cluster periodically sends a heartbeat request to the resource manager—by default, one per second. Heartbeats carry information about the node manager’s running containers and the resources available for new containers, so each heartbeat is a potential scheduling opportunity for an application to run a container.
When using delay scheduling, the scheduler doesn’t simply use the first scheduling opportunity it receives, but waits for up to a given maximum number of scheduling opportunities to occur before loosening the locality constraint and taking the next scheduling opportunity.
For the Capacity Scheduler, delay scheduling is configured by setting yarn.scheduler.capacity.node-locality-delay
to a positive integer representing the number of scheduling opportunities that it is prepared to miss before loosening the node constraint to match any node in the same rack.
Created 05-31-2016 06:05 AM
@Sagar Shimpi this is nice post explaining how this property used by RMContainerAllocator while allocating a container upon request.
http://johnjianfang.blogspot.in/2014/08/delay-scheduling-in-capacity-scheduling.html