Created 01-31-2019 03:31 PM
I'm facing this issue when try to using GPU on YARN:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request, requested resource type=[yarn.io/gpu] < 0 or greater than maximum allowed allocation. Requested resource=<memory:3072, vCores:1, yarn.io/gpu: 1>, maximum allowed allocation=<memory:9216, vCores:9>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:9216, vCores:9, yarn.io/gpu: 9223372036854775807>
I already enabled GPU on my cluster but some how, it still showing that the (without yarn.io/gpu) maximum allowed allocation=<memory:9216, vCores:9>
Created 02-14-2019 04:12 PM
After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch.
Still nothing working.
Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then....
NOTHING hortonWORKS !
But my Googling was better suited then.
It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy:
https://issues.apache.org/jira/browse/YARN-9161
https://issues.apache.org/jira/browse/YARN-9205
Temporary solution to benefit from isolation:
For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.
Created 02-05-2019 02:29 PM
Same exact problem. I have 2 GPUs in my test cluster, both are showing up (load included) in the RM / Nodes UI, but none of then can be allocated.... same "maximum allocation" reffering only to CPUs and RAM
Created 02-08-2019 03:25 PM
It seems to be about the ResourceCalculator used when requesting containers, as it shows only CPU/memory, like the DefaultResourceCalculator should do it. But Everywhere I check, my node registers his GPU properly and DominantResourceCalculator is set...
Created 02-14-2019 04:12 PM
After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch.
Still nothing working.
Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then....
NOTHING hortonWORKS !
But my Googling was better suited then.
It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy:
https://issues.apache.org/jira/browse/YARN-9161
https://issues.apache.org/jira/browse/YARN-9205
Temporary solution to benefit from isolation:
For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.
Created 05-09-2019 08:34 AM
Have run into the same issue. It works with FairScheduler but not CapacityScheduler. To add to the instructions above for those who normally use CapacityScheduler (99.99% of the Hadoop population :-)) but want to try with FairScheduler, remember also to disable other CS specific features, such as Preemption as Resource Manager won't start otherwise.