Reply
New Contributor
Posts: 2
Registered: ‎12-19-2013

Scheduler in Hadoop with CDH 4.2.1

We have a couple questions around scheduler configuration options. We’ve had a couple of incidents where a single user’s job blocked other user’s job by consuming all of the mappers. The jobs finished however because those jobs were tied to a web application, the web interface timed-out. We are also concerned that in the future large analytical MR jobs could consume cpu/memory and severely impact search jobs that are tied to a web interface and timeout issues could occur. Currently we are using FIFO scheduler. We want to give priority to the web application jobs and limit other analytic jobs.

 

Answers to these specific questions would be helpful:

 

  1. To confirm, with FIFO scheduler there aren’t any site/cluster configuration options to prevent job waiting / conflict issues?
  2.  As a work around we could limit mappers at run time when submitting large analytic jobs?
  3. Does it make sense to transition to FAIR or YARN for better multi-tenant Hadoop?
  4. If so given we are using CDH 4.2.1 what is the upgrade path to FAIR or YARN?
Highlighted
Cloudera Employee
Posts: 3
Registered: ‎12-06-2013

Re: Scheduler in Hadoop with CDH 4.2.1

Hi pbraxton,

 

That's correct that the FIFO Scheduler does not provide mechanisms to avoid these conflicts.  We recommend moving to the Fair Scheduler for multitenancy.  While the YARN/MR2 fair scheduler has additional features (hierarchical pools, multi-resource scheduling), the MR1 Fair Scheduler will likely support your use case as well.

 


All you need to do to start using it is to change a single property in your JobTracker's configuration and add a fair-scheduler.xml allocations file.  Here's the documentation on the MR1 Fair Scheduler.

https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html

 

If you have any additional questions on it I'd be happy to try to answer them.

 

-Sandy

Announcements