I struggle to find the complete guidance how to isolate and schedule jobs application GPUs in hdp3.0 ? With hdp3.0 GPU is as a native asset, so are only gpu isolation and scheduling the only improvements ?
For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.