Created 02-11-2016 12:39 AM
I think at this moment, controlling priority by capacity and preemption would be the only way to start and finish high priority jobs faster, am I correct?
Ideally, I would like to set some priority *per* a job/application, but I found https://issues.apache.org/jira/browse/YARN-1963 . So I guess this is not possible.
I also found http://blog.sequenceiq.com/blog/2014/03/14/yarn-capacity-scheduler/ which queue names are "highPriority" and "lowPriority", but if my reading is correct, this is not actually setting any priority but because the high has more capacity, the job finish faster.
Until YARN-1963 is released, I would like to always start jobs in highPrority queue before any jobs in lowPriority queue, and if possible, I would like low priority jobs to wait until high priority jobs finishes.
Any advice/hint is welcome.
Created 02-11-2016 01:23 AM
In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)
Created 02-11-2016 12:40 AM
@Hajime aside from telling a job to elevate priority I don't know of other mechanisms. With Oozie, Sqoop, MR its possible. Probably possible with Spark and rest too.
Created 02-11-2016 12:40 AM
Created 02-11-2016 12:54 AM
Thank you.
I didn't know "mapred.capacity-scheduler<queue-name>.supports-priority"
Is this supported?
I don't find any code matching "supports-priority" in hadoop project...
Created 02-11-2016 01:23 AM
In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)