Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

SOLVED Go to solution

Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

Contributor

Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?  What are the main differences between these two schedulers?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

Master Collaborator

The Fair Scheduler is recommended by Cloudera. Here is some background:

 

http://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

5 REPLIES 5

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

Master Collaborator

The Fair Scheduler is recommended by Cloudera. Here is some background:

 

http://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

New Contributor

I am not able to understand the difference between Fair and capacity scheduler. From what I have read I understood that they both are identical except for the fact that capacity scheduler has FIFO for the users within a queue. I am not sure what this means and if this is the complete truth. So it will be really helpful if someone can explain this is plain and simple words.

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

Master Collaborator

This might clear things up:

 

Fair - Allocates resources to weighted pools, with fair sharing within each pool (docs).

Capacity - Allocates resources to pools, with FIFO scheduling within each pool (docs).

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

New Contributor

Hi jkestelyn,

 

Thanks for the links, 

 

But this has caused even more confusion! in Apache documentation for Capacity scheduling it is mentioned 

Spoiler
When there is demand for these resources from queues running below capacity at a future point in time, as tasks scheduled on these resources complete, they will be assigned to applications on queues running below the capacity (pre-emption is not supported)

Where as in Cloudera documentation : Job Scheduling in Apache Hadoop

Spoiler
The Capacity Scheduler also supports configuring a wait time on each queue after which it is allowed to preempt other queues’ tasks if it is below its fair share

Both are contradictory! So, once again, please clarify what is the actual behavior and the difference between these two scheduling methods.

Re: Cloudera’s Fair Scheduler vs. Capacity Scheduler, which one is the best option to choose?

Master Collaborator

Hi,

 

The Cloudera "documentation" you reference here is actually an 8-year-old blog post. I would defer to the more current docs.