Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Scheduler in Hadoop with CDH 4.2.1


Scheduler in Hadoop with CDH 4.2.1

New Contributor

We have a couple questions around scheduler configuration options. We’ve had a couple of incidents where a single user’s job blocked other user’s job by consuming all of the mappers. The jobs finished however because those jobs were tied to a web application, the web interface timed-out. We are also concerned that in the future large analytical MR jobs could consume cpu/memory and severely impact search jobs that are tied to a web interface and timeout issues could occur. Currently we are using FIFO scheduler. We want to give priority to the web application jobs and limit other analytic jobs.


Answers to these specific questions would be helpful:


  1. To confirm, with FIFO scheduler there aren’t any site/cluster configuration options to prevent job waiting / conflict issues?
  2.  As a work around we could limit mappers at run time when submitting large analytic jobs?
  3. Does it make sense to transition to FAIR or YARN for better multi-tenant Hadoop?
  4. If so given we are using CDH 4.2.1 what is the upgrade path to FAIR or YARN?

Re: Scheduler in Hadoop with CDH 4.2.1

Cloudera Employee

Hi pbraxton,


That's correct that the FIFO Scheduler does not provide mechanisms to avoid these conflicts.  We recommend moving to the Fair Scheduler for multitenancy.  While the YARN/MR2 fair scheduler has additional features (hierarchical pools, multi-resource scheduling), the MR1 Fair Scheduler will likely support your use case as well.


All you need to do to start using it is to change a single property in your JobTracker's configuration and add a fair-scheduler.xml allocations file.  Here's the documentation on the MR1 Fair Scheduler.


If you have any additional questions on it I'd be happy to try to answer them.