Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the

slots/containers limitatation?

slots/containers limitatation?

Explorer

If I have only one job running on a host (either mapreduce or yarn), will it be able to use all resources available on the host or only one slot/container for map and one slot/container for reduce?

 

Appreciate the insights.

4 REPLIES 4

Re: slots/containers limitatation

Master Guru
Jobs are not limited to the use of a single container per host. Assuming no
scheduler level restrictions, and proper resource requests, your job should
consume multiple containers on the same host.

Re: slots/containers limitatation

Explorer

Thanks for the response.

 

Can you explain this further pls : no scheduler level restrictions, and proper resource requests

Highlighted

Re: slots/containers limitatation

Master Guru
You can configure queues in schedulers with various restrictions on
applications that land in such a queue. Please read
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Config...

As to resource requests, I meant the properties of MR2 such as
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/m...
These should be a small factor of the value of
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.n...

Re: slots/containers limitatation

Explorer

So if my scheduler is FIFO and all performance parameters like child.ulimit, map.tasks.max, re educe.tasks.max etc (likewise yarn with the memory/vcores parameters) are configured adequately based on system resources, a single task running on a datanode will consume full resources if no other tasks are running?

 

So would that imply that when the job is kicked off, the jobtracker or resourcemanager, which looks for nodes and resources to run the job, at the start itself will know that a particular node has no jobs running and so can grab/assign as much resources as possible from that particular node? And the grabbed resources for that single task (on that particular datanode) will be in a single container or multiple containers? Also when another job kicks in, the jobtracker/resourcemanager will negotiate/takeback some of the resources granted to the first job and give it to the new job?

 

Also another question is that with fair scheduler, if I am not mistaken, the minimum gaurenteed resource for a queue is initially taken and not given up. So in a case where there are four queues each which grab a minimum of 10% of available resources, then a lone job on one of these queues can only avail of 70% of the total resources (3 * 10% minimum grabbed and held on by the other 3 queues).

 

Appreciate the clarification.