These are yarn parameters which controls the maximum and
minimum conatiner sizes which yarn can allocate to containers:
YARN PARAMETERS:
---->
yarn.scheduler.minimum-allocation-mb - The minimum allocation for every
container request at the RM, in MBs. Memory requests lower than this won't take
effect, and the specified value will get allocated at minimum.
---->
yarn.scheduler.maximum-allocation-mb - The maximum allocation for every
container request at the RM, in MBs. Memory requests higher than this won't
take effect, and will get capped to this value.
MAPREDUCE PARAMETERS:
Client side parameters which job requests. We can override
this.
mapreduce.map.memory.mb - Map container size
mapreduce.map.reduce.mb
- Reducer container size
Note : If we request memory > yarn max allocation limit, Job
will fail as yarn will report it can not allocate that much memory.
Below given are few examples:
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Example: (Following will fail)
+=============================+
Server side:
yarn.scheduler.minimum-allocation-mb=1024
yarn.scheduler.maximum-allocation-mb=8196
Client size:
mapreduce.map.memory.mb=10240
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Another example: (Following will work):
+=============================+
Server side:
yarn.scheduler.minimum-allocation-mb=1024
yarn.scheduler.maximum-allocation-mb=8196
Client size:
mapreduce.map.memory.mb=800
In this case mapper will get 1024 (Minimum conatiner size)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Another example: (Following will work):
+=============================+
Server side:
yarn.scheduler.minimum-allocation-mb=1024
yarn.scheduler.maximum-allocation-mb=8196
Client size:
mapreduce.map.memory.mb=1800
In this case mapper will get 2048
Note: Single job can use single/multiple containers depending upon size of input data, split size and nature of data.