Reply
Highlighted
Explorer
Posts: 35
Registered: ‎11-24-2015

slots/containers limitatation?

If I have only one job running on a host (either mapreduce or yarn), will it be able to use all resources available on the host or only one slot/container for map and one slot/container for reduce?

 

Appreciate the insights.

Posts: 1,885
Kudos: 422
Solutions: 298
Registered: ‎07-31-2013

Re: slots/containers limitatation

Jobs are not limited to the use of a single container per host. Assuming no
scheduler level restrictions, and proper resource requests, your job should
consume multiple containers on the same host.
Explorer
Posts: 35
Registered: ‎11-24-2015

Re: slots/containers limitatation

Thanks for the response.

 

Can you explain this further pls : no scheduler level restrictions, and proper resource requests

Posts: 1,885
Kudos: 422
Solutions: 298
Registered: ‎07-31-2013

Re: slots/containers limitatation

You can configure queues in schedulers with various restrictions on
applications that land in such a queue. Please read
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Config...

As to resource requests, I meant the properties of MR2 such as
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/m...
These should be a small factor of the value of
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.n...
Explorer
Posts: 35
Registered: ‎11-24-2015

Re: slots/containers limitatation

[ Edited ]

So if my scheduler is FIFO and all performance parameters like child.ulimit, map.tasks.max, re educe.tasks.max etc (likewise yarn with the memory/vcores parameters) are configured adequately based on system resources, a single task running on a datanode will consume full resources if no other tasks are running?

 

So would that imply that when the job is kicked off, the jobtracker or resourcemanager, which looks for nodes and resources to run the job, at the start itself will know that a particular node has no jobs running and so can grab/assign as much resources as possible from that particular node? And the grabbed resources for that single task (on that particular datanode) will be in a single container or multiple containers? Also when another job kicks in, the jobtracker/resourcemanager will negotiate/takeback some of the resources granted to the first job and give it to the new job?

 

Also another question is that with fair scheduler, if I am not mistaken, the minimum gaurenteed resource for a queue is initially taken and not given up. So in a case where there are four queues each which grab a minimum of 10% of available resources, then a lone job on one of these queues can only avail of 70% of the total resources (3 * 10% minimum grabbed and held on by the other 3 queues).

 

Appreciate the clarification.

Announcements