Support Questions

arunkumar_d · ‎08-03-2016

Hi,

I am running a cluster with 15 data node, 15 region server and 16 node manager (of course name node, Secondary name node, Hactive master, Resource manager). All the machines are m3.large type machine basically so, 2 core processor and 7.5GB of RAM.

By default it allocates 32GB for the yarn memory and 1vcore. Here my default configuration and it uses DefaultResourceCalculator.

yarn.scheduler.minimum-allocation-mb: 682

yarn.scheduler.maximum-allocation-mb: 2048

yarn.nodemanager.resource.cpu-vcores : 1

yarn.nodemanager.resource.memory-mb: 2048

when I run a mapreduce job it takes about some 30min to complete it till the time the yarn memory utilization was high, I thought that the yarn memory was the issue. So I have doubled the size as below.

yarn.scheduler.minimum-allocation-mb: 1024

yarn.scheduler.maximum-allocation-mb: 4096

yarn.nodemanager.resource.cpu-vcores : 1

yarn.nodemanager.resource.memory-mb: 4096

Now, yarn memory increased from 32Gb to 64GB, but when I run a same mapreduce job with newer configuration it takes me around 42 min though yarn memory all the 64GB the cluster seems slower than before. So, I would like to understand the containers resource allocation and why it’s slow down after I increased the memory also I would like to see how many containers per cluster and per node (any calculation). Please suggest me with the recommended configuration in this case.

Thanks

Arun

divakarreddy_a · ‎08-03-2016

@Arunkumar Dhanakumar

Here is the link for calculating YARN memory:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/determi...

1) How many data nodes do you have?

2) How many disks do you have in each data node?

3) Did you install HBase?

4) How many Cores do you have on each data node?

5) RAM size on each data node?

arunkumar_d · ‎08-03-2016

Please find my cluster details in my first mail, also I am using have 2 disk per node.

Document says the recommended configuration as below, that’s the same I have did in my newer configuration.

yarn.scheduler.minimum-allocation-mb=1024

yarn.scheduler.maximum-allocation-mb=4096

yarn.nodemanager.resource.memory-mb=4096

mapreduce.map.memory.mb=512

mapreduce.map.java.opts=-Xmx409m

mapreduce.reduce.memory.mb=1024

mapreduce.reduce.java.opts=-Xmx819m

yarn.app.mapreduce.am.resource.mb=512

yarn.app.mapreduce.am.command-opts=-Xmx409m

mapreduce.task.io.sort.mb=204

Thanks

divakarreddy_a · ‎08-03-2016

Basically you increased your YARN memory from 32Gb to 64GB, it means you increased all containers memory. Container is a unit for YARN submitting the jobs in-terms of CPU and RAM.

you increased YARN container size then what about Tez container size?

--> ideally tez container size should be multiple of YARN Memory.

--> ideally we can allocate two containers per disk and per CPU.

Cloudera Community

Support Questions

Yarn memory utilization.