I'm a little bit confused with YARN fair scheduler, I'll be glad somebody helps me :-).
Let's say that I run a TEZ application (an HIVE query for example implying a simple Map/Reduce) with the following parameters :
yarn.scheduler.maximum-allocation-vcores=16 yarn.scheduler.minimum-allocation-mb=2048 mapreduce.map.memory.mb=4096 mmapreduce.reduce.memory.mb=8192
When I run my query, it will demand to the Resources Manager :
- an Application Master creation (including the DAG) (??? GB)
- a node manager creation for the Map task (2x2GB)
- a node manager creation for the Reduce task (4x2GB)
If the cluster is busy (not enought memory available), with the FIFO scheduler, my query will be placed in queue, waiting for the freeing of resources (1 container for the AM, 1 container of 4GB for the map task and 1 container of 8GB for the reduce task).
First of all, am I right ?
With the fair scheduler, I read "With the Fair Scheduler, there is no need to reserve a set amount of
capacity, since it will dynamically balance resources between all running jobs."
That will say that I will not wait to get the 3 enought sized containers and then, that I will get (for example) a container of 2GB for the Map task and a container of 2GB for the reduce task IF the average container size is 2GB ? In this case, increasing mapreduce.map|reduce.memory.mb is not useful ?
Is that so ?
Thanks a lot to help me to see clearer :-)