Support Questions

Find answers, ask questions, and share your expertise

Fair scheduler preemption and resource pools weights



Let me start with the question first: is the fair scheduler preemption supposed to preempt resource to respect the instantaneous fair share of each queue/pool or not?

I have a YARN cluster with two pools:

  • default (weight: 1 and no preemption) - used by low priority jobs
  • high (weight 9 and 60s. of FairSharePreemptionTimeout with FairSharePreemptionThreshold set to 1.0) - used by a critical application


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <queue name="root">
        <queue name="default">
        <queue name="high">
        <rule name="specified" create="false"/>
        <rule name="user" create="false"/>
        <rule name="default"/>

In the last three days I had the following issue:

  1. The cluster is empty
  2. An user submit a Spark application in the default queue, using all the cluster resources.
  3. An user submit a Spark applicaiton in the high queue. Potentially this application wants to use all the cluster resources.
  4. The high application waits for 60second for the preemption timeout
  5. The preemption kicks in and removes some containers from the default application
  6. Now I have the two pools/queues with 50/50 default/high, or even 30/70 default/high


Is this the expected behaviour? Why the final resource ratio does not respect the weights? I was expecting a 10/90 given the weights...

This issue occured with real jobs, but I also managed to reproduce it with a simple spark-shell:


  1. Launch a job in the default queue with: spark-shell --master yarn --num-executors=50 --executor-memory=30G --executor-cores=10 --queue default (the number of executor/cores/memory is very high just to max the cluster)
  2. After the job fills the cluster resources i launch a job in the high priority queue: spark-shell --master yarn --num-executors=50 --executor-memory=30G --executor-cores=10 --queue high
  3. I wait for 60s for the preemption.
  4. The final resource distribution is 50/50 (default/high)






I don't have anything to back up but by my experience this is expected. Weights are a percentage and overall the Fair Scheduler works to ensure equal share. There are other settings you can use to ensure that the high queue gets more but with just this I would expect that both jobs get roughly half of the resources.

This would obviously change if more jobs were added to either queue or if you have more than just two queues.