Support Questions
Find answers, ask questions, and share your expertise

Spark jobs on YARN Docker runtime

Spark jobs on YARN Docker runtime

Hi everybody

As everybody needs a different stack, I'm trying out the docker runtime.

I think to be good on the configuration itself (I link my /etc/passwd, I have a trusted repository...) and Yarn do tries to start Docker containers when I launch a job.

So far it failed for different reasons, and I can't really figure out what the exact requirements are on the Docker image side.

After a "/bin/bash: /usr/jdk64/jdk1.8.0_112/bin/java: No such file or directory", I tried to set JAVA_HOME, but then it's a

[2019-04-05 23:28:17.926]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/user/appcache/application_1554498498150_0003/container_e15_1554498498150_0003_01_000001/launch_container.sh: line 42: /bin/hadoop: No such file or directory
Error files: stderr, stderr.txt.
Last 4096 bytes of stderr :
/bin/bash: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java: No such file or directory


But in stdout, "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" does exists, both on the host and in the docker image...

And on stderr, "/bin/hadoop" does exists on the host but not in the image, as everyone is saying that you don't have to install hadoop nor spark inside it.

So, to launch a dumb spark job on YARN using Docker runtime, let's say to print a "hello world" from appMaster, what should be the canonical Docker image setup ?

What should be installed beforehand, which environment variables do I have to set, and do you know the exact command that uses YARN uses to instantiate my container ?

Cheers everybody,
Kevin