Support Questions

hosako · ‎02-11-2016

I think at this moment, controlling priority by capacity and preemption would be the only way to start and finish high priority jobs faster, am I correct?

Ideally, I would like to set some priority *per* a job/application, but I found https://issues.apache.org/jira/browse/YARN-1963 . So I guess this is not possible.

I also found http://blog.sequenceiq.com/blog/2014/03/14/yarn-capacity-scheduler/ which queue names are "highPriority" and "lowPriority", but if my reading is correct, this is not actually setting any priority but because the high has more capacity, the job finish faster.

Until YARN-1963 is released, I would like to always start jobs in highPrority queue before any jobs in lowPriority queue, and if possible, I would like low priority jobs to wait until high priority jobs finishes.

Any advice/hint is welcome.

bleonhardi · ‎02-11-2016

In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)

View solution in original post

aervits · ‎02-11-2016

@Hajime aside from telling a job to elevate priority I don't know of other mechanisms. With Oozie, Sqoop, MR its possible. Probably possible with Spark and rest too.

nsabharwal · ‎02-11-2016

@Hajime

See this https://community.hortonworks.com/questions/8725/capacityscheduler-job-priority-preemption.html

hosako · ‎02-11-2016

Thank you.

I didn't know "mapred.capacity-scheduler<queue-name>.supports-priority"

Is this supported?

I don't find any code matching "supports-priority" in hadoop project...

bleonhardi · ‎02-11-2016

In the capacity scheduler your could set a high priority queue at 90% of the cluster with extension( max. Capacity) to 100% and a low priority queue wirh 10% with extension( max capacity) to 100%. In this case jobs the first queue would always get 90% of the cluster if it needs it and the second queue would only get a tiny amount of the cluster if the high priority Queues have Queries. The low priority queue would still be able to monopolize the cluster if if has very long running tasks. But you could fix that with preemption. ( Or by making sure tasks in your cluster don't run for too long which they shouldn't anyway.)

Cloudera Community

Support Questions

How can I set priority to queues with Capacity Schedular?

Capacity schedular.

Control User Access to Capacity Scheduler Queues.

Setting up Yarn queue acls

Yarn queues - No Capacity Scheduler view

Priority of a Hadoop job

Calculating Minimum Queue Capacity required for st...

how to set the yarn application priority in hive ?

YARN capacity confiurations Illegal capacity of 0....

Yarn Queue Capacity Scheduling

API to manage YARN Capacity Queue