- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can I set priority to queues with Capacity Schedular?
- Labels:
-
Apache YARN
Created ‎02-11-2016 12:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think at this moment, controlling priority by capacity and preemption would be the only way to start and finish high priority jobs faster, am I correct?
Ideally, I would like to set some priority *per* a job/application, but I found https://issues.apache.org/jira/browse/YARN-1963 . So I guess this is not possible.
I also found http://blog.sequenceiq.com/blog/2014/03/14/yarn-capacity-scheduler/ which queue names are "highPriority" and "lowPriority", but if my reading is correct, this is not actually setting any priority but because the high has more capacity, the job finish faster.
Until YARN-1963 is released, I would like to always start jobs in highPrority queue before any jobs in lowPriority queue, and if possible, I would like low priority jobs to wait until high priority jobs finishes.
Any advice/hint is welcome.
Created ‎02-11-2016 01:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)
Created ‎02-11-2016 12:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Hajime aside from telling a job to elevate priority I don't know of other mechanisms. With Oozie, Sqoop, MR its possible. Probably possible with Spark and rest too.
Created ‎02-11-2016 12:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-11-2016 12:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
I didn't know "mapred.capacity-scheduler<queue-name>.supports-priority"
Is this supported?
I don't find any code matching "supports-priority" in hadoop project...
Created ‎02-11-2016 01:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)
