Member since
02-01-2019
6
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2340 | 02-14-2019 04:12 PM |
04-09-2019
12:32 AM
Hi everybody As everybody needs a different stack, I'm trying out the docker runtime. I think to be good on the configuration itself (I link my /etc/passwd, I have a trusted repository...) and Yarn do tries to start Docker containers when I launch a job. So far it failed for different reasons, and I can't really figure out what the exact requirements are on the Docker image side. After a "/bin/bash: /usr/jdk64/jdk1.8.0_112/bin/java: No such file or directory", I tried to set JAVA_HOME, but then it's a [2019-04-05 23:28:17.926]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/user/appcache/application_1554498498150_0003/container_e15_1554498498150_0003_01_000001/launch_container.sh: line 42: /bin/hadoop: No such file or directory
Error files: stderr, stderr.txt.
Last 4096 bytes of stderr :
/bin/bash: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java: No such file or directory
But in stdout, "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" does exists, both on the host and in the docker image... And on stderr, "/bin/hadoop" does exists on the host but not in the image, as everyone is saying that you don't have to install hadoop nor spark inside it. So, to launch a dumb spark job on YARN using Docker runtime, let's say to print a "hello world" from appMaster, what should be the canonical Docker image setup ? What should be installed beforehand, which environment variables do I have to set, and do you know the exact command that uses YARN uses to instantiate my container ? Cheers everybody, Kevin
... View more
Labels:
04-01-2019
03:58 PM
It should be fixed since 3.1.2 https://issues.apache.org/jira/browse/YARN-8822 EDIT: So not with Hortonworks since so far we're stuck with 3.1.0
... View more
02-15-2019
10:39 AM
For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.
... View more
02-14-2019
04:12 PM
2 Kudos
After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch. Still nothing working. Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then.... NOTHING hortonWORKS ! But my Googling was better suited then. It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy: https://issues.apache.org/jira/browse/YARN-9161 https://issues.apache.org/jira/browse/YARN-9205 Temporary solution to benefit from isolation: For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.
... View more
02-08-2019
03:25 PM
It seems to be about the ResourceCalculator used when requesting containers, as it shows only CPU/memory, like the DefaultResourceCalculator should do it. But Everywhere I check, my node registers his GPU properly and DominantResourceCalculator is set...
... View more
02-05-2019
02:29 PM
Same exact problem. I have 2 GPUs in my test cluster, both are showing up (load included) in the RM / Nodes UI, but none of then can be allocated.... same "maximum allocation" reffering only to CPUs and RAM
... View more