Hi,
Let me start with the question first: is the fair scheduler preemption supposed to preempt resource to respect the instantaneous fair share of each queue/pool or not?
I have a YARN cluster with two pools:
- default (weight: 1 and no preemption) - used by low priority jobs
- high (weight 9 and 60s. of FairSharePreemptionTimeout with FairSharePreemptionThreshold set to 1.0) - used by a critical application
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<queue name="default">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>*</aclSubmitApps>
<aclAdministerApps>*</aclAdministerApps>
</queue>
<queue name="high">
<weight>9.0</weight>
<fairSharePreemptionTimeout>60</fairSharePreemptionTimeout>
<fairSharePreemptionThreshold>1.0</fairSharePreemptionThreshold>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>high_user</aclSubmitApps>
<aclAdministerApps>high_user</aclAdministerApps>
</queue>
</queue>
<defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
<queuePlacementPolicy>
<rule name="specified" create="false"/>
<rule name="user" create="false"/>
<rule name="default"/>
</queuePlacementPolicy>
</allocations
In the last three days I had the following issue:
- The cluster is empty
- An user submit a Spark application in the default queue, using all the cluster resources.
- An user submit a Spark applicaiton in the high queue. Potentially this application wants to use all the cluster resources.
- The high application waits for 60second for the preemption timeout
- The preemption kicks in and removes some containers from the default application
- Now I have the two pools/queues with 50/50 default/high, or even 30/70 default/high
Is this the expected behaviour? Why the final resource ratio does not respect the weights? I was expecting a 10/90 given the weights...
This issue occured with real jobs, but I also managed to reproduce it with a simple spark-shell:
- Launch a job in the default queue with: spark-shell --master yarn --num-executors=50 --executor-memory=30G --executor-cores=10 --queue default (the number of executor/cores/memory is very high just to max the cluster)
- After the job fills the cluster resources i launch a job in the high priority queue: spark-shell --master yarn --num-executors=50 --executor-memory=30G --executor-cores=10 --queue high
- I wait for 60s for the preemption.
- The final resource distribution is 50/50 (default/high)
Thanks,
p.