Created on 06-24-2021 12:04 AM - last edited on 06-24-2021 12:19 AM by VidyaSargur
Hello,
I am facing issue with resources for individual jobs in a queue.
My 'Dev' queue details =
How can I make sure that individual job can take unlimited resources in the particular queue?
For now I have increased resources from 30% to 80% for 'Dev' queue as 'Maximum user limit 100%' is not helping.
Regards,
Amey.
Created on 06-30-2021 03:49 PM - edited 06-30-2021 03:50 PM
@dmharshit
It's difficult to explain in 3 minutes but the capacity scheduler in YARN allows multi-tenancy of the Hadoop cluster where multiple users can share the large cluster.
Every company has a private cluster cal leads to poor resource utilization. though it may provide enough resources in the cluster to meet their peak demand that peak demand may not occur that frequently, resulting in poor resource utilization at the rest of the time.
Thus sharing clusters among Companys is a more cost-effective idea. However, Companys are concerned about sharing a cluster because they are worried that they may not get enough resources at the time of peak utilization. The CapacityScheduler in YARN mitigates that concern by giving each Company capacity guarantees.
Capacity scheduler in YARN functionality
Capacity scheduler in Hadoop works on the concept of queues. For example, each department gets its own dedicated queue with a percentage of the total cluster capacity for its own use. For example, if there are two departments sharing the cluster, one department may be given 60% of the cluster capacity and the other department is given 40%.
On top of that, to provide further control and predictability on sharing of resources, the CapacityScheduler supports hierarchical queues. The company can further divide its allocated cluster capacity into separate sub-queues for a separate set of users within the department.
The capacity scheduler is also flexible and allows the allocation of free resources to any queue beyond its capacity. This provides elasticity for the Companys in a cost-effective manner. When the queue to which these resources actually belong has increased demand the resources are allocated to it when those resources are released from other queues.
This is a fantastic write-up YARN the Capacity Scheduler
The maximum capacity is an elastic-like capacity that allows queues to make use of resources that are not being used to fill minimum capacity demand in other queues.
Children Queues like in the figure above inherit the resources of their parent queue. For example, with the Preference branch, the Low leaf queue gets 20% of the Preference 20% minimum capacity while the High lead gets 80% of the 20% minimum capacity. Minimum Capacity always has to add up to 100% for all the leafs under a parent.
I didn't have the opportunity tonight to but a cluster to mirror the above setup and share the capacity scheduler config to give you a better understanding.
Created 06-28-2021 03:45 AM
I also observed that the queue is unable to use available resources from other queues when the resources are available.
Anybody facing issue issue?
Let's say 'Dev' queue has weightage of 30% with 100% capacity enabled.
The queue max reaches 40% capacity even if more resources are available in other queues.
Created 06-30-2021 07:19 AM
I assume you are using Capacity scheduler not fair scheduler. that's why queues wont take available resources from other queues, you can read more regarding that here Comparison of Fair Scheduler with Capacity Scheduler | CDP Public Cloud (cloudera.com) .
Created 07-12-2021 01:59 AM
@tarekabouzeid91 wrote:I assume you are using Capacity scheduler not fair scheduler. that's why queues wont take available resources from other queues, you can read more regarding that here Comparison of Fair Scheduler with Capacity Scheduler | CDP Public Cloud (cloudera.com) .
Yes I am using Capacity scheduler.
yarn.resourcemanager.scheduler.class = org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Created on 06-30-2021 03:49 PM - edited 06-30-2021 03:50 PM
@dmharshit
It's difficult to explain in 3 minutes but the capacity scheduler in YARN allows multi-tenancy of the Hadoop cluster where multiple users can share the large cluster.
Every company has a private cluster cal leads to poor resource utilization. though it may provide enough resources in the cluster to meet their peak demand that peak demand may not occur that frequently, resulting in poor resource utilization at the rest of the time.
Thus sharing clusters among Companys is a more cost-effective idea. However, Companys are concerned about sharing a cluster because they are worried that they may not get enough resources at the time of peak utilization. The CapacityScheduler in YARN mitigates that concern by giving each Company capacity guarantees.
Capacity scheduler in YARN functionality
Capacity scheduler in Hadoop works on the concept of queues. For example, each department gets its own dedicated queue with a percentage of the total cluster capacity for its own use. For example, if there are two departments sharing the cluster, one department may be given 60% of the cluster capacity and the other department is given 40%.
On top of that, to provide further control and predictability on sharing of resources, the CapacityScheduler supports hierarchical queues. The company can further divide its allocated cluster capacity into separate sub-queues for a separate set of users within the department.
The capacity scheduler is also flexible and allows the allocation of free resources to any queue beyond its capacity. This provides elasticity for the Companys in a cost-effective manner. When the queue to which these resources actually belong has increased demand the resources are allocated to it when those resources are released from other queues.
This is a fantastic write-up YARN the Capacity Scheduler
The maximum capacity is an elastic-like capacity that allows queues to make use of resources that are not being used to fill minimum capacity demand in other queues.
Children Queues like in the figure above inherit the resources of their parent queue. For example, with the Preference branch, the Low leaf queue gets 20% of the Preference 20% minimum capacity while the High lead gets 80% of the 20% minimum capacity. Minimum Capacity always has to add up to 100% for all the leafs under a parent.
I didn't have the opportunity tonight to but a cluster to mirror the above setup and share the capacity scheduler config to give you a better understanding.