Support Questions
Find answers, ask questions, and share your expertise

Preemption using Capacity Scheduler

Highlighted

Preemption using Capacity Scheduler

Explorer

Hello,

 

As we move from CDH using Fair Scheduler to CDP using Capacity Scheduler, we're trying to achieve certain results we expect to be possible, but so far can't seem to make happen.  I'll try to set up the issue as concisely as I can...

 

We have two queues...  root.core.deposits.high and root.core.deposits.priority.  "core" has a capacity value of 60%.  desposits has a value of 70%.  high = 35% and priority = 50%  (there is also a root.core.deposits.low, where low = 15%, but thats not relevant for the following example).

 

For the leaf queues, Minimum User Limit = 25% and User Limit Factor is set arbitrarily high (100) to effectively turn it off. Ordering Policy = Fair

 

Currently we have 410GB and 64 cores allocated to YARN.  Preemption is turned on in YARN for both inter and intra queues (and does seem to function, as we do see SOME preemption).

 

I've run 4 variations of my test, as shown below.  My test job is a Pi calculation that generates 500,000 containers, so there's plenty of pending containers to trigger preemption. We've made the following observations.

 

1) With two jobs submitted to the same queue as the same user, there is no intra-queue preemption.  The jobs run completely FIFO.

2) With different users but the same queue, the jobs preempt and appear to try to achieve an even distribution of resources (much like it did under Fair Scheduler, though in that scenario resources would only move as they attrited from a job naturally, rather than being forcibly preempted)

3) With different queues, I see preemption work, but very slowly (1 container at a time)

 

Q) How can we get intra-queue preemption to work when considering two jobs from the same user?  Is that possible?

 

Q) How can we increase the aggressiveness of preemption when running with different queues?  I've tried changing Total Resources Per Round to 90%, Over Capacity Tolerance to 0 and Maximum Termination Factor to 90%, but the number of containers preempted per 15s preemption polling round remained at 1.   ( I know I could change the polling interval to < 15s, but I think I should be able to simply move more than 1 container in a shot). Enable Multiple Assignments Per Heartbeat is set to True.  Changing Maximum Off-Switch Assignments Per Heartbeat from 1 to 10 did nothing as well.

 

Any insight into how we can dial things in to achieve what we hope is possible would be appreciated.  Please let me know if I can provide any more data.  Thanks.

 

Mike

 

 

       
same u / same q user1 root.core.deposits.high  
  user1 root.core.deposits.high no observed preemption.  The runs operate in FIFO manner   (intra-queue same user)
       
diff u / same q user1 root.core.deposits.high  
  user2 root.core.deposits.high 1 container preempted every 15 seconds until first job completes   (intra-queue diff user)
       
same u / diff  q user1 root.core.deposits.high  
  user1 root.core.deposits.priority 1 container preempted every 15 seconds to 13 containers.  No additional containers until first job completes
       
diff u / diff q user2 root.core.deposits.high  
  user1 root.core.deposits.priority 1 container preempted every 15 seconds to 13 containers.  No additional containers until first job completes