- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-18-2016 12:25 PM
In this post, we will see how to configure node labels on YARN.
Before we go for the configuration part, let’s understand what is node label in YARN.
Node labels allows us to divide our cluster in different parts and we can use those parts individually as per our requirements. More specifically, we can create a group of node-managers using node labels, for example group of node managers which are having high amount of RAM and use them to process only critical production jobs! This is cool, isn’t it? So lets see how we can configure node labels on YARN.
.
Types of node labels:
Exclusive – In this type of node labels, only associated/mapped queues can access the resources of node label.
Non Exclusive(sharable) – If resources are not in use for this node label then it can be shared with other running applications in a cluster.
.
Configuring node labels:
.
Step 1: Create required directory structure on HDFS
Note – You can run below commands from any of the hdfs client.
sudo su hdfs hadoop fs -mkdir -p /yarn/node-labels hadoop fs -chown -R yarn:yarn /yarn hadoop fs -chmod -R 700 /yarn
.
Step 2: Make sure that you have user directory for ‘yarn’ user on HDFS, if not then please create it using below commands
Note – You can run below commands from any of the hdfs client.
sudo su hdfs hadoop fs -mkdir -p /user/yarn hadoop fs -chown -R yarn:yarn /user/yarn hadoop fs -chmod -R 700 /user/yarn
.
Step 3: Configure below properties in yarn-site.xml via Ambari UI. If you don’t have Ambari UI, please add it manually to /etc/hadoop/conf/yarn-site.xml and restart required services.
yarn.node-labels.enabled=true yarn.node-labels.fs-store.root-dir=hdfs://<namenode-host>:<namenode-rpc-port>/<complete-path_to_node_label_directory>
Note – Please restart required services after above configuration changes!
.
Step 4: Create node labels using below commands
sudo -u yarn yarn rmadmin -addToClusterNodeLabels "<node-label1>(exclusive=<true|false>),<node-label2>(exclusive=<true|false>)"
For example, to add 2 node labels x and y:
sudo -u yarn yarn rmadmin -addToClusterNodeLabels "x(exclusive=true),y(exclusive=false)"
You can verify if node labels have been created by looking at Resource manager UI under ‘Node Lables’ option in the left pane or you can also run below command on any of the Yarn client
yarn cluster --list-node-labels
Sample output:
[yarn@prodnode1 ~]$ yarn cluster --list-node-labels 16/12/14 15:45:56 INFO impl.TimelineClientImpl: Timeline service address: http://prodnode3.openstacklocal:8188/ws/v1/timeline/ 16/12/14 15:45:56 INFO client.RMProxy: Connecting to ResourceManager at prodnode3.openstacklocal/172.26.74.211:8050 Node Labels: <x:exclusivity=true>,<y:exclusivity=false>
.
Step 5: Allocate node labels to the node managers using below command:
sudo -u yarn yarn rmadmin -replaceLabelsOnNode "<node-manager1>:<port>=<node-label1> <node-manager2>:<port>=<node-label2>"
Example:
sudo -u yarn yarn rmadmin -replaceLabelsOnNode "prodnode1.openstacklocal=x prodnode2.openstacklocal=y"
Note – Don’t worry about port if you have only one node manager running per host.
.
Step 6: Map node labels to the queues:
I have created 2 queues ‘a’ and ‘b’ in such a way that, queue ‘a’ can access nodes with label ‘x’ and ‘y’ where queue ‘b’ can only access the nodes with label ‘y’. By default, all the queues can access nodes with ‘default’ label.
Below is my capacity scheduler configuration:
yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.queue-mappings-override.enable=false yarn.scheduler.capacity.root.a.a1.accessible-node-labels=x,y yarn.scheduler.capacity.root.a.a1.accessible-node-labels.x.capacity=30 yarn.scheduler.capacity.root.a.a1.accessible-node-labels.x.maximum-capacity=100 yarn.scheduler.capacity.root.a.a1.accessible-node-labels.y.capacity=50 yarn.scheduler.capacity.root.a.a1.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.a.a1.acl_administer_queue=* yarn.scheduler.capacity.root.a.a1.acl_submit_applications=* yarn.scheduler.capacity.root.a.a1.capacity=40 yarn.scheduler.capacity.root.a.a1.maximum-capacity=100 yarn.scheduler.capacity.root.a.a1.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.a.a1.ordering-policy=fifo yarn.scheduler.capacity.root.a.a1.state=RUNNING yarn.scheduler.capacity.root.a.a1.user-limit-factor=1 yarn.scheduler.capacity.root.a.a2.accessible-node-labels=x,y yarn.scheduler.capacity.root.a.a2.accessible-node-labels.x.capacity=70 yarn.scheduler.capacity.root.a.a2.accessible-node-labels.x.maximum-capacity=100 yarn.scheduler.capacity.root.a.a2.accessible-node-labels.y.capacity=50 yarn.scheduler.capacity.root.a.a2.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.a.a2.acl_administer_queue=* yarn.scheduler.capacity.root.a.a2.acl_submit_applications=* yarn.scheduler.capacity.root.a.a2.capacity=60 yarn.scheduler.capacity.root.a.a2.maximum-capacity=60 yarn.scheduler.capacity.root.a.a2.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.a.a2.ordering-policy=fifo yarn.scheduler.capacity.root.a.a2.state=RUNNING yarn.scheduler.capacity.root.a.a2.user-limit-factor=1 yarn.scheduler.capacity.root.a.accessible-node-labels=x,y yarn.scheduler.capacity.root.a.accessible-node-labels.x.capacity=100 yarn.scheduler.capacity.root.a.accessible-node-labels.x.maximum-capacity=100 yarn.scheduler.capacity.root.a.accessible-node-labels.y.capacity=50 yarn.scheduler.capacity.root.a.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.a.acl_administer_queue=* yarn.scheduler.capacity.root.a.acl_submit_applications=* yarn.scheduler.capacity.root.a.capacity=40 yarn.scheduler.capacity.root.a.maximum-capacity=40 yarn.scheduler.capacity.root.a.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.a.ordering-policy=fifo yarn.scheduler.capacity.root.a.queues=a1,a2 yarn.scheduler.capacity.root.a.state=RUNNING yarn.scheduler.capacity.root.a.user-limit-factor=1 yarn.scheduler.capacity.root.accessible-node-labels=x,y yarn.scheduler.capacity.root.accessible-node-labels.x.capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.x.maximum-capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.y.capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.b.accessible-node-labels=y yarn.scheduler.capacity.root.b.accessible-node-labels.y.capacity=50 yarn.scheduler.capacity.root.b.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.b.acl_administer_queue=* yarn.scheduler.capacity.root.b.acl_submit_applications=* yarn.scheduler.capacity.root.b.b1.accessible-node-labels=y yarn.scheduler.capacity.root.b.b1.accessible-node-labels.y.capacity=100 yarn.scheduler.capacity.root.b.b1.accessible-node-labels.y.maximum-capacity=100 yarn.scheduler.capacity.root.b.b1.acl_administer_queue=* yarn.scheduler.capacity.root.b.b1.acl_submit_applications=* yarn.scheduler.capacity.root.b.b1.capacity=100 yarn.scheduler.capacity.root.b.b1.maximum-capacity=100 yarn.scheduler.capacity.root.b.b1.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.b.b1.ordering-policy=fifo yarn.scheduler.capacity.root.b.b1.state=RUNNING yarn.scheduler.capacity.root.b.b1.user-limit-factor=1 yarn.scheduler.capacity.root.b.capacity=60 yarn.scheduler.capacity.root.b.maximum-capacity=100 yarn.scheduler.capacity.root.b.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.b.ordering-policy=fifo yarn.scheduler.capacity.root.b.queues=b1 yarn.scheduler.capacity.root.b.state=RUNNING yarn.scheduler.capacity.root.b.user-limit-factor=1 yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.queues=a,b
.
Please visit http://crazyadmins.com/configure-node-labels-on-yarn/ for more details and FAQs.
.
Please comment if you need any further help on this. Happy Hadooping!! 🙂
Created on 02-14-2018 05:55 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
HI ...
Thank you for the post. Is there a way to add node labels and queues through java API? We are planning to add node labels and queues on demand based on job submission.