Created 08-03-2016 06:02 AM
Hi,
I am running a cluster with 15 data node, 15 region server and 16 node manager (of course name node, Secondary name node, Hactive master, Resource manager). All the machines are m3.large type machine basically so, 2 core processor and 7.5GB of RAM.
By default it allocates 32GB for the yarn memory and 1vcore. Here my default configuration and it uses DefaultResourceCalculator.
yarn.scheduler.minimum-allocation-mb: 682
yarn.scheduler.maximum-allocation-mb: 2048
yarn.nodemanager.resource.cpu-vcores : 1
yarn.nodemanager.resource.memory-mb: 2048
when I run a mapreduce job it takes about some 30min to complete it till the time the yarn memory utilization was high, I thought that the yarn memory was the issue. So I have doubled the size as below.
yarn.scheduler.minimum-allocation-mb: 1024
yarn.scheduler.maximum-allocation-mb: 4096
yarn.nodemanager.resource.cpu-vcores : 1
yarn.nodemanager.resource.memory-mb: 4096
Now, yarn memory increased from 32Gb to 64GB, but when I run a same mapreduce job with newer configuration it takes me around 42 min though yarn memory all the 64GB the cluster seems slower than before. So, I would like to understand the containers resource allocation and why it’s slow down after I increased the memory also I would like to see how many containers per cluster and per node (any calculation). Please suggest me with the recommended configuration in this case.
Thanks
Arun
Created 08-03-2016 06:22 AM
Here is the link for calculating YARN memory:
1) How many data nodes do you have?
2) How many disks do you have in each data node?
3) Did you install HBase?
4) How many Cores do you have on each data node?
5) RAM size on each data node?
Created 08-03-2016 06:34 AM
Please find my cluster details in my first mail, also I am using have 2 disk per node.
Document says the recommended configuration as below, that’s the same I have did in my newer configuration.
yarn.scheduler.minimum-allocation-mb=1024
yarn.scheduler.maximum-allocation-mb=4096
yarn.nodemanager.resource.memory-mb=4096
mapreduce.map.memory.mb=512
mapreduce.map.java.opts=-Xmx409m
mapreduce.reduce.memory.mb=1024
mapreduce.reduce.java.opts=-Xmx819m
yarn.app.mapreduce.am.resource.mb=512
yarn.app.mapreduce.am.command-opts=-Xmx409m
mapreduce.task.io.sort.mb=204
Thanks
Created 08-03-2016 06:37 AM
Basically you increased your YARN memory from 32Gb to 64GB, it means you increased all containers memory. Container is a unit for YARN submitting the jobs in-terms of CPU and RAM.
you increased YARN container size then what about Tez container size?
--> ideally tez container size should be multiple of YARN Memory.
--> ideally we can allocate two containers per disk and per CPU.