Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Making Sense of YARN Fair Scheduler and Weights

Highlighted

Making Sense of YARN Fair Scheduler and Weights

Explorer

We're running into a situation where some of our MR jobs are hanging due to lack of available mappers and reducers, and I'm trying to understand how our Fair Scheduler is set up in order to better troubleshoot where the issue is rather than changing configs on a whim. Here is our fair-scheduler.xml:

 

 

<allocations>
    <queue name="root">
        <weight>1.0</weight>
        <schedulingPolicy>drf</schedulingPolicy>
        <aclSubmitApps>*</aclSubmitApps>
        <aclAdministerApps>*</aclAdministerApps>
        <queue name="maintenance">
            <minResources>1 mb, 1 vcores</minResources>
            <maxResources>125000 mb, 64 vcores</maxResources>
            <maxRunningApps>1</maxRunningApps>
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
            <aclSubmitApps>*</aclSubmitApps>
            <aclAdministerApps>*</aclAdministerApps>
        </queue>
        <queue name="prod">
            <minResources>1 mb, 1 vcores</minResources>
            <maxResources>125000 mb, 75 vcores</maxResources>
            <weight>3.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
            <aclSubmitApps>*</aclSubmitApps>
            <aclAdministerApps>*</aclAdministerApps>
        </queue>
        <queue name="datascience">
            <minResources>1 mb, 1 vcores</minResources>
            <maxResources>250000 mb, 125 vcores</maxResources>
            <weight>3.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
            <aclSubmitApps>*</aclSubmitApps>
            <aclAdministerApps>*</aclAdministerApps>
        </queue>
    </queue>
    <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
    <queuePlacementPolicy>
        <rule name="specified" create="true"/>
        <rule name="user" create="true"/>
    </queuePlacementPolicy>
</allocations>

 

 

Question 1:

What is the math for translating "weight" to an actual value of a resource? Given a cluster of say 100GB memory, in the above config what math do I do to translate how much memory each queue gets? 

 

Question 2:

Do I understand correctly that if a job is submitted to the Datascience queue, and nothing else is running in another queue, the Datascience job will get all the resources it asks for (provided it doesn't go over any other limits like yarn.nodemanager.resource.memory-mb or something like that), and that if a second job launches under the Prod queue that's when the fair scheduler kicks in and tries to allocate its fair share to the Prod queue? 

 

Question 3: 

What effect does having [min:max]Resources set for these queues have on resource allocation related to fair scheduling? I read it as: if a job launches in the Prod queue it can't request more than 125GB of memory, so if it's the only job there it will still only get a maximum of 125GB, so as long as we have more memory than that available in our cluster if a job launches in Datascience it will run those jobs and allocate up to 250GB of memory. 

 

Thank you in advance for your help in understanding this. 

 

 

1 REPLY 1

Re: Making Sense of YARN Fair Scheduler and Weights

Super Collaborator

There is a number of blog posts on our site that should help and answer all your questions.

This one here should cover the questions on weights and things like it, but please read all of them.

 

Wilfred

Don't have an account?
Coming from Hortonworks? Activate your account here