Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN memory configuration relatively?

avatar
Rising Star

Hi, I'm confused about how to config memory in YARN cluster.

So far, I have some machines each with 64GB physical memory. Because all machines are 64GB, I can set a unified yarn.nodemanager.resource.memory-mb=60GB。

If I want to add  new and better machines with 128GB physical memory. How should I set the memory configuration in YARN.

If I set yarn.nodemanager.resource.memory-mb=120GB, will this affect the old machines?

If I set yarn.nodemanager.resource.memory-mb=60GB, will this waste the new machines' resource?

Is there a relative ration to fit each machine adaptively? For example, set yarn.nodemanager.resource.memory-mb=0.8*physical_memory?

Thanks.

1 ACCEPTED SOLUTION

avatar
Mentor
The configs you are targeting are configs local to each NodeManager. You can configure each NodeManager distinctly and still have a working (heterogenous) cluster. The RM views the cluster as an aggregate but does consider each NM's resource limits as its own published value when they register.

If you are using Cloudera Manager to install and manage your cluster, then you are looking for the concept of 'Role Groups' as explained here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_role_groups.html. The role groups layer lets you divide your set of hosts and their roles into groups of differing configurations, which in your case you can use to divide in two and configure each group's resource.memory.mb separately (into 60 GB and 120 GB as appropriate).

If you do not use Cloudera Manager, then I highly recommend it, but you can also manage the NodeManager's local yarn-site.xml of each NodeManager separately -- i.e. have two distinct copies, the one for 60 GB placed only on the older hosts, and the other for 120 GB placed only on the newer hosts.

View solution in original post

4 REPLIES 4

avatar
Mentor
The configs you are targeting are configs local to each NodeManager. You can configure each NodeManager distinctly and still have a working (heterogenous) cluster. The RM views the cluster as an aggregate but does consider each NM's resource limits as its own published value when they register.

If you are using Cloudera Manager to install and manage your cluster, then you are looking for the concept of 'Role Groups' as explained here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_role_groups.html. The role groups layer lets you divide your set of hosts and their roles into groups of differing configurations, which in your case you can use to divide in two and configure each group's resource.memory.mb separately (into 60 GB and 120 GB as appropriate).

If you do not use Cloudera Manager, then I highly recommend it, but you can also manage the NodeManager's local yarn-site.xml of each NodeManager separately -- i.e. have two distinct copies, the one for 60 GB placed only on the older hosts, and the other for 120 GB placed only on the newer hosts.

avatar
Rising Star
Yes, I'm using Cloudera Manager. I'll follow your link. Thanks.
Another question is that, I find that HDFS NameNodes exit frequenctly because lack of resource.
How should I reserve enough resource to them from being robbed by YARN?

avatar
Mentor
I recommend filing one topic per question so you reach the right person for the topic.

Ideally the NameNode should exist on another host, or if not, at the very least it should be given enough free cores and a separate disk aside of a slice of RAM for its heap. I'm unsure what exit criteria you speak of - care to add some log snippets or screenshots showing the problem you observe?

avatar
Rising Star
Thanks. I'll try your advice first. And create a new topic if still can not solved.