Created on 01-22-2018 02:46 PM - edited 08-17-2019 10:16 PM
I try to run two Yarn queues where if only one queue is active it will consume all the resources and once a job will arrive to the second queue Yarn will preempt some of the resources of the first queue to start the second job.
For that I follow the manual written YARN Preemption with Spark using a Fair Policy but for some reason only the leaf queues has the option of fair ordering policy. In fact the selection for ordering policy doesn't exist at all on non leaf queue.
I am running on hdp version: 2.6.3.0-235
I have followed the instruction perfectly on the manual.
I have also tried manually set the fair policy in 'Scheduler' tab of Yarn, when I do that Yarn crashes and would not start.
What can I do?
Screenshot are attached
Created 01-23-2018 09:41 AM
You are doing everything just fine, this is by design. The "Ordering Policy" can indeed only be set for leaf queues, because it defines the ordering policy between applications in the same queue. So it has nothing to do with your use case.
"I try to run two Yarn queues where if only one queue is active it will consume all the resources and once a job will arrive to the second queue Yarn will preempt some of the resources of the first queue to start the second job."
To achieve this, you need to configure your queues like this (I think, you already did this):
yarn.scheduler.capacity.root.queues=test1,test2 yarn.scheduler.capacity.root.test1.capacity=50 yarn.scheduler.capacity.root.test1.maximum-capacity=100 yarn.scheduler.capacity.root.test2.capacity=50 yarn.scheduler.capacity.root.test2.maximum-capacity=100 ...
and enable preemption (as described in the article you attached). This will let the first application in the first queue to use all the resources, until the second job arrives to the second queue, then the resources will be devided equally between the two queues.
Hope this makes everything clear, give it a try 🙂
Created 01-23-2018 09:41 AM
You are doing everything just fine, this is by design. The "Ordering Policy" can indeed only be set for leaf queues, because it defines the ordering policy between applications in the same queue. So it has nothing to do with your use case.
"I try to run two Yarn queues where if only one queue is active it will consume all the resources and once a job will arrive to the second queue Yarn will preempt some of the resources of the first queue to start the second job."
To achieve this, you need to configure your queues like this (I think, you already did this):
yarn.scheduler.capacity.root.queues=test1,test2 yarn.scheduler.capacity.root.test1.capacity=50 yarn.scheduler.capacity.root.test1.maximum-capacity=100 yarn.scheduler.capacity.root.test2.capacity=50 yarn.scheduler.capacity.root.test2.maximum-capacity=100 ...
and enable preemption (as described in the article you attached). This will let the first application in the first queue to use all the resources, until the second job arrives to the second queue, then the resources will be devided equally between the two queues.
Hope this makes everything clear, give it a try 🙂
Created 01-23-2018 01:03 PM
@gnovak thx for the answer, I was able to make it work according to your explanation with one small addition.
I did have to allow excessive queue usage with:
yarn.scheduler.capacity.root.test1.user-limit-factor=2
I have another question, I tried to define test2 to be fair and submit two application to it, but no preemption has happened and one app simply took 100% of the queue. I tried posting from different users and everything but couldn't make it happen. Is there another configuration I need to set for it to work?
Created 01-23-2018 02:01 PM
@Anton P I'm glad it works.
I'm not sure how exactly the "fair" ordering policy works inside one queue, but preemption is only for between queues. I assume, that it will try to give resources to the applications/users in the same queue equally, but once a container is running it will not preempt it. If you would like to achieve that, you should consider creating sub-queues.