Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

Node labels enable you partition a cluster into sub-clusters so that jobs can be run on nodes with specific characteristics. For example, you can use node labels to run memory-intensive jobs only on nodes with a larger amount of RAM. Node labels can be assigned to cluster nodes, and specified as exclusive or shareable. You can then associate node labels with capacity scheduler queues. Each node can have only one node label.

Demo:

Use case

2 node labels : node1 & node2 + Default & Spark queue

Submit job to node1

Node labels added : yarn rmadmin -addToClusterNodeLabels "node1(exclusive=true),node2(exclusive=false)"

Label assigned: yarn rmadmin -replaceLabelsOnNode "phdns02.cloud.hortonworks.com=node2 phdns01.cloud.hortonworks.com=node1"

Job Submission:

Job send to node1 only and assign to queue spark

hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node1

Job send to node2 only and assign to queue spark

hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node2

Job send to node1 only and assign to queue default

hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue default -node_label_expression node1

More details - Doc link

SlideShare

2,930 Views
Comments

After reading above, I'm just curious to know what would happen in the following scenario:

1) Create queues (ex: Rack1, Rack2, Rack3...)

2) Create (exclusive=true) Node Labels and assign to queues per my physical rack layout

3) Didn't set up HDFS rack-awareness (so that replication won't care about rack)

4) Submit a job to the queue "Rack1" but all blocks for this data are in DataNodes in different rack (ex: Rack2)

Would YARN AM try to create a remote container in a NameNode in Rack2? Or keep using container in Rack1 but fetch the data from a remote DataNode?

@Hajime

It will go to Rack1 and Step 3 ...I won't do that :)

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎01-23-2016 11:30 PM
Updated by:
 
Contributors
Top Kudoed Authors