Background: I have configured two node labels - HiCPU (exclusive=false) and GPU (exclusive=true). I have attached HiCPU to a queue named Engineering with 100% capacity. Label GPU is attached to a queue named Marketing with 100% capacity. No default label has been configured for either queue at the beginning of the test.
When I run the following commands as the hdfs user, the command will only run on an unlabeled node, and if no unlabeled nodes are available, the job simply hangs:
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 25" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -num_containers 3 -queue Engineering ResourceRequest.setNodeLabelExpression HiCPU
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 25" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -num_containers 3 -queue Marketing ResourceRequest.setNodeLabelExpression GPU
However, if I set the node label as default, the commands *DO* execute on the appropriate machine, even without the ResourceRequest.setNodeLabelExpression attribute (as would be expected).
Bottom line - I can only get node labels to work for a YARN job if they are set as the default, which means non-labeled nodes are not available to that queue any longer for YARN jobs.
Our documentation here:
...states the following:
"...if you submit a MapReduce job to a queue that has a default node label expression, the default node label will be applied to the MapReduce job."
To test this, I executed the following command using a user who was default-queue-mapped to the Engineering queue:
yarn jar /usr/hdp/220.127.116.11-2557/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 5 10
While the job did get assigned to the correct queue, in no instance could I get the MapReduce job to run on a labeled node. It would run on an unlabeled node if available, and if no unlabeled nodes were available, it would just hang.
Thus, unless I am missing something, it appears that the *only* way to get a functional use of node labels is to set them as a default for a queue, in which case YARN jobs assigned to that queue will ONLY run on the labeled nodes (including unlabeled YARN jobs). Furthermore, under no circumstance will a MapReduce job run on a labeled node, regardless of default node label settings.
If someone wants to take a look at my settings on my cluster and troubleshoot, let me know. Thanks!
Interesting. So the command you used *did* allow you to set node labels on the fly. If I'm reading correctly, the main difference was the
being replaced with
Is that right? I went back and checked the documentation, and saw that it now matches what you performed. So the core of my issue in particular was a bug in the docs. Thanks for following up!
Please see this
ResourceRequest.setNodeLabelExpression(<node_label_expression>) -- sets the node label expression for individual resource requests. This will override the node label expression set in
I used the following commands.
hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node1
Hi.. Adding one point to your discussion. If we don't assign some default partition in the cluster, queues are unable to run more jobs even the resources are available.
Initially, i configured four node labels with four nodes each and without default partition.In this case, when i submit a job in queue, queue ran only one job, another job which i submitted is in ACCEPTED state even though cluster resources are available to that queue.
Later, i configured four node labels with four nodes each and with default partition as two nodes. Now, i am able to run multiple jobs in a queue.
I came to know that it was a bug in Application Master reported in Jira.
I thought it may help you in your further processes.