Created on 08-11-2023 08:26 AM - edited 08-11-2023 08:30 AM
Hello,
I'm new to using labels on YARN nodes. I have successfully set up the labels but the scheduler is allocating all resources to the DEFAULT_PARTITION under "Effective Capacity" and 0 resources to the labeled partition.
As the screenshots illustrate, the NodeManager is launching with the correct label and has the correct resources assigned to that label, however, applications will not start when assigned to that label because although the partition has resources assigned to it, the queue under the partition does not. Here's my capacity-scheduler.xml:
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>1.0</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,spark</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.disable_preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
<value>true</value>
</property>
</configuration>
And here is the relevant parts of yarn-site.xml:
<property>
<name>yarn.node-labels.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.node-labels.configuration-type</name>
<value>distributed</value>
</property>
<property>
<name>yarn.node-labels.fs-store.root-dir</name>
<value>hdfs://xxx:9000/user/yarn/node-labels/</value>
</property>
<property>
<name>yarn.nodemanager.node-labels.provider</name>
<value>config</value>
</property>
<property>
<name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
<value>node</value>
</property>
I'm using Hadoop 3.3.4 built from source. In case it matters, this is in my dev environment with a single ResourceManager and NodeManager. Any suggestions are much appreciated. Thanks!
Created on 08-18-2023 05:01 AM - edited 08-18-2023 05:04 AM
This was caused by me overlooking "root" as an actual queue and not giving it the proper permissions for label and capacity to pass on to the child queues. The configuration in the writeup here tipped me off: https://www.ibm.com/support/pages/yarn-node-labels-label-based-scheduling-and-resource-isolation-had...
Here is the full configuration that gives me the desired behaviour:
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>1.0</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.node.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.node.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,spark</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.disable_preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
<value>true</value>
</property>
</configuration>
Created on 08-18-2023 05:01 AM - edited 08-18-2023 05:04 AM
This was caused by me overlooking "root" as an actual queue and not giving it the proper permissions for label and capacity to pass on to the child queues. The configuration in the writeup here tipped me off: https://www.ibm.com/support/pages/yarn-node-labels-label-based-scheduling-and-resource-isolation-had...
Here is the full configuration that gives me the desired behaviour:
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>1.0</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels</name>
<value>*</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.node.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.node.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,spark</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=11776,vcores=4]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.disable_preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
<value>node</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
<value>[memory=4096,vcores=1]</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
<value>9</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
<value>true</value>
</property>
</configuration>