12-15-2015 07:19 PM
If I have only one job running on a host (either mapreduce or yarn), will it be able to use all resources available on the host or only one slot/container for map and one slot/container for reduce?
Appreciate the insights.
12-15-2015 07:31 PM
12-15-2015 07:41 PM
12-16-2015 08:45 PM - edited 12-16-2015 08:52 PM
So if my scheduler is FIFO and all performance parameters like child.ulimit, map.tasks.max, re educe.tasks.max etc (likewise yarn with the memory/vcores parameters) are configured adequately based on system resources, a single task running on a datanode will consume full resources if no other tasks are running?
So would that imply that when the job is kicked off, the jobtracker or resourcemanager, which looks for nodes and resources to run the job, at the start itself will know that a particular node has no jobs running and so can grab/assign as much resources as possible from that particular node? And the grabbed resources for that single task (on that particular datanode) will be in a single container or multiple containers? Also when another job kicks in, the jobtracker/resourcemanager will negotiate/takeback some of the resources granted to the first job and give it to the new job?
Also another question is that with fair scheduler, if I am not mistaken, the minimum gaurenteed resource for a queue is initially taken and not given up. So in a case where there are four queues each which grab a minimum of 10% of available resources, then a lone job on one of these queues can only avail of 70% of the total resources (3 * 10% minimum grabbed and held on by the other 3 queues).
Appreciate the clarification.