Created on 03-21-2017 12:41 PM - last edited on 09-07-2021 07:40 AM by cjervis
Hi there,
I have enabled preemption for YARN as per : https://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/
I observed that if the queues are 100% occupied by Hive (TEZ with container reuse enabled) or Spark jobs already and if a new job is submitted to any queue, it will not start until any of the existing tasks finish. At the same time if I try to launch hive cli, it will also hang forever until some tasks are finished and resources are deallocated.
If TEZ container reuse is disabled, new jobs will start getting resources - this is not because of preemption, but each container will last only for a few secs and the new containers will go to new jobs. Spark is anyway not touched - it will not release any resources.
Anyone has any hint as to why preemption is not happening ? Also, how to preempt spark jobs ?
Values are as follows -
yarn.resourcemanager.scheduler.monitor.enable = true yarn.resourcemanager.scheduler.monitor.policies = org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval = 3000 yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill = 15000 yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round = 0.1
Created 03-27-2017 07:57 AM
After setting below 2 parameters on custom yarn-site.xml, things started working.
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor
Created 03-21-2017 08:40 PM
Preemption will not kill existing tasks that are running. As tasks for any given job finish, those resources are then made available to the jobs in the queue that are relying on preemption. The idea behind using the queues is to assign a minimum amount of cluster resources to a given user/job. With preemption enabled, jobs can get access to a larger percentage of resources when they are available. If a new job comes in that requires a minimum percentage of resources than what is currently available, those resources will be made available as the currently running jobs individual tasks are completed.
What does your capacity scheduler queues look like in terms of percentage of cluster resources? What is the min and the max values?
Created 03-21-2017 10:36 PM
Thanks @Michael Young for your answer. But, I don't think that's how it works.
As per my understanding, the case you talked about is when preemption is disabled - ie, new tasks have to wait until the existing ones are finished. New ones cannot start if the available resources is less than the minimum requirement. I think, the whole point of preemption is to avoid this scenario by forcefully killing containers held by existing jobs from over utilized queues if they're not willing to release resources in 'x' amount of time.
Please see here , STEP #3 reads "such containers will be forcefully killed by the ResourceManager to ensure that SLAs of applications in under-satisfied queues are met".
To answer your other question, I have 4 queues, Q1 to Q4, each has 25% min capacity and 100% max capacity. Q2 is divided into Q21 and Q22 with 50%(min) each. All of them uses FIFO.
Created 03-27-2017 07:57 AM
After setting below 2 parameters on custom yarn-site.xml, things started working.
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor
Created 09-06-2021 01:55 PM
Can you please elaborate what values you set for these parameters ?