Created 05-09-2016 12:49 PM
Hello guys,
I have SUSE11 SP4 machine where I have installed and configured HDP2.3, YARN, MapReduce, etc. I have done no modifications during the installation using the Ambari UI - just clicking next. I am using Amazon image with m4.xlarge size which means 4 vCPU and 16 GiB Memory. The YARN version is 2.7.1.2.3.
When I open the
/etc/hadoop/2.3.4.7-4/0/yarn-site.xml
I see the following entries there:
<property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>3</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>3</value> </property>
My question is how the "yarn.nodemanager.resource.cpu-vcores" value determined? I thought that this value is relevant to the vCPUs which in this case are 4.
Created 05-09-2016 02:54 PM
@Elitsa Milanova yarn.nodemanager.resource.cpu-vcores are by default ~80% of total vCPUs available on the machine. Ambari internal script picks this default config based on this calculation AFAIK. But it may not always be the best practice depending on what other non-yarn components you are running on the machine, OS requirements etc. These default configs are starting point for you and will need tunings/changes depending on use case, workload requirements, cluster/host(s) specifications.
Created 05-09-2016 01:16 PM
Check this blog for getting fair idea for the values.-
http://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/
http://crazyadmins.com/tag/tuning-yarn-to-get-maximum-performance/
Created 05-09-2016 02:06 PM
Hi @Sagar Shimpi,
Thank you for the posts.
I get the idea how to configure the values but what I still do not understand is how come when I use c4.xlarge, for example, the values are set to 3 i.e. and when I use m3.2xlarge i.e. the values are set to 1, keeping in mind that there are no explicit configuration I made and the configurations does not differ between the two hosts. How this default value is set? If this is default, why not 1 everytime? 🙂
Created 05-09-2016 02:54 PM
@Elitsa Milanova yarn.nodemanager.resource.cpu-vcores are by default ~80% of total vCPUs available on the machine. Ambari internal script picks this default config based on this calculation AFAIK. But it may not always be the best practice depending on what other non-yarn components you are running on the machine, OS requirements etc. These default configs are starting point for you and will need tunings/changes depending on use case, workload requirements, cluster/host(s) specifications.
Created 05-10-2016 08:27 AM
Thank you, @Pardeep and @Sagar Shimpi!
Finally, from the articles above and from your replies I have managed to get a short summary. I am posting this in case somebody else wonders about this 🙂
"In order to handle the variety of workloads related with intense CPU usage, YARN has introduced a new concept called "vcores" (short for virtual cores). A vcore, is a usage share of a host CPU which YARN Node Manager allocates to use all available resources in the most efficient possible way. YARN hosts can be tuned to optimize the use of vcores by configuring the available YARN containers as the number of vcores has to be set by an administrator in yarn-site.xml on each node. The decision of how much it should be set to is driven by the type of workloads running in the cluster and the type of hardware available. The general recommendation is to set it to the number of physical cores on the node, but administrators can bump it up if they wish to run additional containers on nodes with faster CPUs. In order to enable CPU scheduling, there are some configuration properties that administrators and users need to be aware of:
“yarn.scheduler.maximum-allocation-vcores” controls the maximum vcores that any submitted job can request. “yarn.nodemanager.resource.cpu-vcores” controls how many vcores can be scheduled on a particular NodeManager instance. So “yarn.nodemanager.resource.cpu-vcores” can vary from host to host (NodeManager to NodeManager), while “yarn.scheduler.maximum-allocation-vcores” is a global property of the scheduler."
Further information can be taken also from here: https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores...