Support Questions

Find answers, ask questions, and share your expertise

Invalid resource request, requested resource type=[yarn.io/gpu]

avatar

I'm facing this issue when try to using GPU on YARN:

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request, requested resource type=[yarn.io/gpu] < 0 or greater than maximum allowed allocation. Requested resource=<memory:3072, vCores:1, yarn.io/gpu: 1>, maximum allowed allocation=<memory:9216, vCores:9>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:9216, vCores:9, yarn.io/gpu: 9223372036854775807>

I already enabled GPU on my cluster but some how, it still showing that the (without yarn.io/gpu) maximum allowed allocation=<memory:9216, vCores:9>

1 ACCEPTED SOLUTION

avatar

After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch.
Still nothing working.

Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then....
NOTHING hortonWORKS !

But my Googling was better suited then.
It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy:

https://issues.apache.org/jira/browse/YARN-9161
https://issues.apache.org/jira/browse/YARN-9205


Temporary solution to benefit from isolation:

For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.

View solution in original post

4 REPLIES 4

avatar

Same exact problem. I have 2 GPUs in my test cluster, both are showing up (load included) in the RM / Nodes UI, but none of then can be allocated.... same "maximum allocation" reffering only to CPUs and RAM

avatar

It seems to be about the ResourceCalculator used when requesting containers, as it shows only CPU/memory, like the DefaultResourceCalculator should do it. But Everywhere I check, my node registers his GPU properly and DominantResourceCalculator is set...

avatar

After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch.
Still nothing working.

Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then....
NOTHING hortonWORKS !

But my Googling was better suited then.
It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy:

https://issues.apache.org/jira/browse/YARN-9161
https://issues.apache.org/jira/browse/YARN-9205


Temporary solution to benefit from isolation:

For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.

avatar
Contributor

Have run into the same issue. It works with FairScheduler but not CapacityScheduler. To add to the instructions above for those who normally use CapacityScheduler (99.99% of the Hadoop population :-)) but want to try with FairScheduler, remember also to disable other CS specific features, such as Preemption as Resource Manager won't start otherwise.