Support Questions

Find answers, ask questions, and share your expertise

How can I set priority to queues with Capacity Schedular?

avatar

I think at this moment, controlling priority by capacity and preemption would be the only way to start and finish high priority jobs faster, am I correct?

Ideally, I would like to set some priority *per* a job/application, but I found https://issues.apache.org/jira/browse/YARN-1963 . So I guess this is not possible.

I also found http://blog.sequenceiq.com/blog/2014/03/14/yarn-capacity-scheduler/ which queue names are "highPriority" and "lowPriority", but if my reading is correct, this is not actually setting any priority but because the high has more capacity, the job finish faster.

Until YARN-1963 is released, I would like to always start jobs in highPrority queue before any jobs in lowPriority queue, and if possible, I would like low priority jobs to wait until high priority jobs finishes.

Any advice/hint is welcome.

1 ACCEPTED SOLUTION

avatar
Master Guru

In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Hajime aside from telling a job to elevate priority I don't know of other mechanisms. With Oozie, Sqoop, MR its possible. Probably possible with Spark and rest too.

avatar
Master Mentor

avatar

Thank you.

I didn't know "mapred.capacity-scheduler<queue-name>.supports-priority"

Is this supported?

I don't find any code matching "supports-priority" in hadoop project...

avatar
Master Guru

In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)