Support Questions
Find answers, ask questions, and share your expertise

Capacity scheduler preemption doesn't work

Explorer

Hi there,

I have enabled preemption for YARN as per : https://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/

I observed that if the queues are 100% occupied by Hive (TEZ with container reuse enabled) or Spark jobs already and if a new job is submitted to any queue, it will not start until any of the existing tasks finish. At the same time if I try to launch hive cli, it will also hang forever until some tasks are finished and resources are deallocated.

If TEZ container reuse is disabled, new jobs will start getting resources - this is not because of preemption, but each container will last only for a few secs and the new containers will go to new jobs. Spark is anyway not touched - it will not release any resources.

Anyone has any hint as to why preemption is not happening ? Also, how to preempt spark jobs ?

Values are as follows -

yarn.resourcemanager.scheduler.monitor.enable = true
yarn.resourcemanager.scheduler.monitor.policies = org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval = 3000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill = 15000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round = 0.1
1 ACCEPTED SOLUTION

Accepted Solutions

Explorer

After setting below 2 parameters on custom yarn-site.xml, things started working.

yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity

yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor

View solution in original post

4 REPLIES 4

@tuxnet

Preemption will not kill existing tasks that are running. As tasks for any given job finish, those resources are then made available to the jobs in the queue that are relying on preemption. The idea behind using the queues is to assign a minimum amount of cluster resources to a given user/job. With preemption enabled, jobs can get access to a larger percentage of resources when they are available. If a new job comes in that requires a minimum percentage of resources than what is currently available, those resources will be made available as the currently running jobs individual tasks are completed.

What does your capacity scheduler queues look like in terms of percentage of cluster resources? What is the min and the max values?

Explorer

Thanks @Michael Young for your answer. But, I don't think that's how it works.

As per my understanding, the case you talked about is when preemption is disabled - ie, new tasks have to wait until the existing ones are finished. New ones cannot start if the available resources is less than the minimum requirement. I think, the whole point of preemption is to avoid this scenario by forcefully killing containers held by existing jobs from over utilized queues if they're not willing to release resources in 'x' amount of time.

Please see here , STEP #3 reads "such containers will be forcefully killed by the ResourceManager to ensure that SLAs of applications in under-satisfied queues are met".

To answer your other question, I have 4 queues, Q1 to Q4, each has 25% min capacity and 100% max capacity. Q2 is divided into Q21 and Q22 with 50%(min) each. All of them uses FIFO.

Explorer

After setting below 2 parameters on custom yarn-site.xml, things started working.

yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity

yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor

View solution in original post

New Contributor

Can you please elaborate what values you set for these parameters ?