Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Doubts about YARN memory configuration in Cloudera Manager

Doubts about YARN memory configuration in Cloudera Manager

Expert Contributor

 

Hi everyone, I have a cluster where each worker has 110 GB of RAM.

On the Cloudera Manager I've configured the following Yarn memory parameters:

 

yarn.nodemanager.resource.memory-mb 80 GB
yarn.scheduler.minimum-allocation-mb 1 GB
yarn.scheduler.maximum-allocation-mb 20 GB
mapreduce.map.memory.mb 0
mapreduce.reduce.memory.mb 0
yarn.app.mapreduce.am.resource.mb 1 GB
mapreduce.job.heap.memory-mb.ratio 0,8
mapreduce.map.java.opts -Djava.net.preferIPv4Stack=true
mapreduce.reduce.java.opts -Djava.net.preferIPv4Stack=true
Map Task Maximum Heap Size 0
Reduce Task Maximum Heap Size 0

 

 

One of my goal was to let YARN to autochoose the correct Java Heap size for the jobs using the 0,8 ratio as the upperbound (20 GB * 0,8 =  16 GB), thus I've leave all the heap and mapper/reducer settings to zero.

 

I have this hive job which perfoms some joins between large tables. Just running the job as it is I get a failure:

 

Container [pid=26783,containerID=container_1389136889967_0009_01_000002] is running 
beyond physical memory limits.
Current usage: 2.7 GB of 2 GB physical memory used; 3.7 GB of 3 GB virtual memory used.
Killing container.

If I explicitly set the memory requirements for the job in the hive code, it completes succesfully:

 

SET mapreduce.map.memory.mb=8192;
SET mapreduce.reduce.memory.mb=16384;
SET mapreduce.map.java.opts=-Xmx6553m;
SET mapreduce.reduce.java.opts=-Xmx13106m;

My question: why does not YARN automatically gives this job enough memory to complete succesfully?

Since I have specified 20 GB as the maximum container size and 0,8 as the maximum heap ratio, I was expecting that YARN could give a max of 16 GB to each mapper/reducer without have to me esplicitly specify these parameters.

 

Could someone please explain what's going on?

 

Thanks for any information.

 

4 REPLIES 4
Highlighted

Re: Doubts about YARN memory configuration in Cloudera Manager

Rising Star

Hi,

 

yarn.scheduler.maximum-allocation-mb is specified as 20 GB which means the largest amount of physical memory, that can be requested for a container and yarn.scheduler.minimum-allocation-mb will be the least amount of physical memory, that can be requested for a container.

 

When we submit a MR job requested container memory will be assigned “mapreduce.map.memory.mb” which is by default 1 GB. If it is not specified then we will be given container of 1GB.(Same for reducer)

 

This can be verified in the yarn logs -:

 

mapreduce.map.memory.mb - requested container memory 1GB

INFO [Thread-52] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceRequest:<memory:1024, vCores:1>

 

mapreduce.map.java.opts - Which is 80% of container memory by default

org.apache.hadoop.mapred.JobConf: Task java-opts do not specify heap size. Setting task attempt jvm max heap size to -Xmx820m

 

 

1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html

 

 

Hope it helps. Let us know if you have any questions

 

Thanks

Jerry

Re: Doubts about YARN memory configuration in Cloudera Manager

Expert Contributor

Hi @Jerry, thank you for the reply.

 

If I understand correctly you are saying that if not explicitly specified values for mapreduce.map.memory.mb and mapreduce.reduce.memory.mb YARN will assign to the job the minimum container memory value yarn.scheduler.minimum-allocation-mb, (1 GB in this case) ?

 

Because from what I can read in the description fields on the Cloudera Manager, I though that if the values for mapreduce.map.memory.mb and mapreduce.reduce.memory.mb are left to zero, the memory assigned to a job should be inferred by the map maximum heap and heap to container ratio:

 

Screenshot 2018-12-16 at 10.16.17.png

 

Could you explain please how this work?

 

 

 

 

Re: Doubts about YARN memory configuration in Cloudera Manager

Champion

Could you let me know the CM / CDH version you are runining ? 

Re: Doubts about YARN memory configuration in Cloudera Manager

Expert Contributor

 

Hi @csguna, CDH version is 5.13.2

Don't have an account?
Coming from Hortonworks? Activate your account here