As everybody needs a different stack, I'm trying out the docker runtime.
I think to be good on the configuration itself (I link my /etc/passwd, I have a trusted repository...) and Yarn do tries to start Docker containers when I launch a job.
So far it failed for different reasons, and I can't really figure out what the exact requirements are on the Docker image side.
After a "/bin/bash: /usr/jdk64/jdk1.8.0_112/bin/java: No such file or directory", I tried to set JAVA_HOME, but then it's a
[2019-04-05 23:28:17.926]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /hadoop/yarn/local/usercache/user/appcache/application_1554498498150_0003/container_e15_1554498498150_0003_01_000001/launch_container.sh: line 42: /bin/hadoop: No such file or directory Error files: stderr, stderr.txt. Last 4096 bytes of stderr : /bin/bash: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java: No such file or directory
But in stdout, "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" does exists, both on the host and in the docker image...
And on stderr, "/bin/hadoop" does exists on the host but not in the image, as everyone is saying that you don't have to install hadoop nor spark inside it.
So, to launch a dumb spark job on YARN using Docker runtime, let's say to print a "hello world" from appMaster, what should be the canonical Docker image setup ?
What should be installed beforehand, which environment variables do I have to set, and do you know the exact command that uses YARN uses to instantiate my container ?