Support Questions

ilia987 · ‎06-19-2018

Hi,

I am having some trouble settings the following scheduler queues params:

have 2 queue Dev and Prod

Root 100%
- Dev 30%
- Prod 70%

(if only one used it should act as 100% of cluster)

Each queue is used by multiple users and resources should be shared equally, but when only one user exists(in each queue) it should use the entire capacity of the queue. And if the user alone in the cluster it should use 100% of the cluster in case of second user join, the scheduler should share the available resources

example flow:

cluster is free of jobs
user A submit job at queue Dev. (it now uses 100% of the cluster)
user B submit job at queue Dev (it hangs in accepted)

i want that the users will share the capacity of the cluster, each should receive 50%

dashmeet_singh · ‎06-19-2018

use Fair Share scheduler

for more info

https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

ilia987 · ‎06-19-2018

It is set on fair, but when one user take all the resources and another user submit a job, the second job will not start untill the first job will finish..

for some reason Minimum User Limit have no effect

dashmeet_singh · ‎06-19-2018

When the second job is submitted ,whether it shows error or job goes in waiting state.

ilia987 · ‎06-19-2018

state accepted

Shelton · ‎06-19-2018

@Ilia K

The default Scheduler in HDP is Capacity Scheduler. You should note the differences between all the 3 settings

Capacity Scheduler

The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.

FIFO (First in First Out)

Is the simplest scheduling algorithm. FIFO simply queues processes in the order that they arrive in the ready queue.

Fair Scheduler

Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time

Having said that in your example above the PROD has 70% so despite "Each queue is used by multiple users and resources should be shared equally" the PROD queue jobs will have priority over the DEV queue which I think is the desired config.

Can you share the below values:

yarn.resourcemanager.scheduler.class

Capacity Scheduler values?

Please revert

ilia987 · ‎06-19-2018

yarn.resourcemanager.scheduler.class = org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

currently Capacity Scheduler values abit differ from what i describe, cause of integration,

currently there are 3 queues default 50% bt and opt each at 25%

i have done the testing on default queue

yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=yarn
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_submit_applications=yarn
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=2
yarn.scheduler.capacity.root.queues=bt,default,opt
yarn.scheduler.capacity.queue-mappings-override.enable=false
yarn.scheduler.capacity.root.bt.acl_administer_queue=*
yarn.scheduler.capacity.root.bt.acl_submit_applications=*
yarn.scheduler.capacity.root.bt.capacity=25
yarn.scheduler.capacity.root.bt.maximum-capacity=100
yarn.scheduler.capacity.root.bt.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.bt.ordering-policy=fair
yarn.scheduler.capacity.root.bt.ordering-policy.fair.enable-size-based-weight=false
yarn.scheduler.capacity.root.bt.priority=0
yarn.scheduler.capacity.root.bt.state=RUNNING
yarn.scheduler.capacity.root.bt.user-limit-factor=1
yarn.scheduler.capacity.root.default.acl_administer_queue=yarn
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=25
yarn.scheduler.capacity.root.default.ordering-policy=fair
yarn.scheduler.capacity.root.default.ordering-policy.fair.enable-size-based-weight=false
yarn.scheduler.capacity.root.default.priority=0
yarn.scheduler.capacity.root.opt.acl_administer_queue=*
yarn.scheduler.capacity.root.opt.acl_submit_applications=*
yarn.scheduler.capacity.root.opt.capacity=25
yarn.scheduler.capacity.root.opt.maximum-capacity=25
yarn.scheduler.capacity.root.opt.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.opt.ordering-policy=fair
yarn.scheduler.capacity.root.opt.ordering-policy.fair.enable-size-based-weight=false
yarn.scheduler.capacity.root.opt.priority=0
yarn.scheduler.capacity.root.opt.state=RUNNING
yarn.scheduler.capacity.root.opt.user-limit-factor=1
yarn.scheduler.capacity.root.priority=0

ilia987 · ‎06-27-2018

Workaround solution:

I made sub queues for each user and it works, but you have to manage it for each user, (we are small company and have few users so it manageable)

root.Dev.Dev_user1
root.Dev.Dev_user2
root.Dev.Dev_user3
root.Prod.Prod_user1
root.Prod.Prod_user2
root.Prod.Prod_user2

I configured this on our grid, but still looking for a better approach

Adija1 · ‎07-08-2018

@Ilia K

In your case i would suggest using the following configuration:
Dev queue: Capacity 30% Max Capacity 70%, User limit Factor: 4, Oredering policy: Fair

Prod queue: Capacity 70% Max Capacity 30%, User limit Factor 2, Oredering policy: Fair

Make sure preemption is enabled in hive config.
These configs should give you the desired result - thus enabling each queue having 100% when the other queue is idle.
The trick is with "User Limit Factor" - which enables the dev queue to "steal" resources from the Prod queue up to 4 times of it's configured capacity (thus resulting in 100% percent in DEV while Prod is idle).

ilia987 · ‎07-08-2018

Thanks but i have done it already,

still looking for better approach cause of maintenance needed every time new user is added

Cloudera Community

Support Questions

Yarn Capacity Scheduler: Share resource between users and queues