As of HDP 2.3 we now have the ability to enable "Fair Sharing" policies within a queue (images below).
1) Is there any public Apache documentation on how this component behaves? I couldn't find it here: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Configurat...
2) Is "Pre-emption" required in order to enable the "Fair Sharing" policy within Capacity Scheduler?
2) Preemption is not must for Fair scheduling
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.
Preemption is not needed. Fair ordering only means that tasks from the same queue will get new task slots ( in their queue ).
I.e. with FIFO if you have one queue and a job comes in that needs 5000 tasks. He will block that queue till it finished.
If you enable Fair ordering new task slots that come up will be distributed to other tasks in the same queue that are accepted by yarn. I.e. assume you have:
Task1 with 5000 tasks ( takes an hour )
Task2 with 10 tasks comes in one second later
Task3 with 10 tasks comes in one minute later.
In FIFO Task1 would take one hour then Task2 is run then Task3.
In Fair once the first set of tasks of Task1 finish the free slots will be distributed between the other waiting tasks so Task2 and 3 would finish earlier. And Task1 a bit later.
Other Queues are not impacted at all and preemption is a completely orthogonal concept as well.