Support Questions

Find answers, ask questions, and share your expertise

YARN Node Labels - Effective Capacity is 0% on labeled partition and infinity% on DEFAULT_PARTITION

avatar
Explorer

Hello,

 

I'm new to using labels on YARN nodes. I have successfully set up the labels but the scheduler is allocating all resources to the DEFAULT_PARTITION under "Effective Capacity" and 0 resources to the labeled partition.

 

yarn_scheduler.jpgnode_labels.jpgyarn_app.jpg

 

As the screenshots illustrate, the NodeManager is launching with the correct label and has the correct resources assigned to that label, however, applications will not start when assigned to that label because although the partition has resources assigned to it, the queue under the partition does not. Here's my capacity-scheduler.xml:

 

 

 

 

<configuration>
    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>1.0</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,spark</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
   <property>
        <name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.disable_preemption</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
        <value>true</value>
    </property>
</configuration>

 

 

 

 

And here is the relevant parts of yarn-site.xml:

 

 

 

 

    <property>
        <name>yarn.node-labels.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.node-labels.configuration-type</name>
        <value>distributed</value>
    </property>
    <property>
        <name>yarn.node-labels.fs-store.root-dir</name>
        <value>hdfs://xxx:9000/user/yarn/node-labels/</value>
    </property>
    <property>
        <name>yarn.nodemanager.node-labels.provider</name>
        <value>config</value>
    </property>
    <property>
        <name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
        <value>node</value>
    </property>

 

 

 

 

I'm using Hadoop 3.3.4 built from source. In case it matters, this is in my dev environment with a single ResourceManager and NodeManager. Any suggestions are much appreciated. Thanks!

1 ACCEPTED SOLUTION

avatar
Explorer

This was caused by me overlooking "root" as an actual queue and not giving it the proper permissions for label and capacity to pass on to the child queues. The configuration in the writeup here tipped me off: https://www.ibm.com/support/pages/yarn-node-labels-label-based-scheduling-and-resource-isolation-had... 

 

Here is the full configuration that gives me the desired behaviour:

 

<configuration>
    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>1.0</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.node.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.node.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,spark</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.disable_preemption</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
        <value>true</value>
    </property>
</configuration>

 

 

View solution in original post

1 REPLY 1

avatar
Explorer

This was caused by me overlooking "root" as an actual queue and not giving it the proper permissions for label and capacity to pass on to the child queues. The configuration in the writeup here tipped me off: https://www.ibm.com/support/pages/yarn-node-labels-label-based-scheduling-and-resource-isolation-had... 

 

Here is the full configuration that gives me the desired behaviour:

 

<configuration>
    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>1.0</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.node.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels.node.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,spark</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=11776,vcores=4]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.disable_preemption</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels</name>
        <value>node</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.accessible-node-labels.node.maximum-capacity</name>
        <value>[memory=4096,vcores=1]</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.default-application-priority</name>
        <value>9</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.spark.disable_preemption</name>
        <value>true</value>
    </property>
</configuration>