Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark jobs on YARN Docker runtime


Spark jobs on YARN Docker runtime

New Contributor

Hi everybody

As everybody needs a different stack, I'm trying out the docker runtime.

I think to be good on the configuration itself (I link my /etc/passwd, I have a trusted repository...) and Yarn do tries to start Docker containers when I launch a job.

So far it failed for different reasons, and I can't really figure out what the exact requirements are on the Docker image side.

After a "/bin/bash: /usr/jdk64/jdk1.8.0_112/bin/java: No such file or directory", I tried to set JAVA_HOME, but then it's a

[2019-04-05 23:28:17.926]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/user/appcache/application_1554498498150_0003/container_e15_1554498498150_0003_01_000001/ line 42: /bin/hadoop: No such file or directory
Error files: stderr, stderr.txt.
Last 4096 bytes of stderr :
/bin/bash: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java: No such file or directory

But in stdout, "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" does exists, both on the host and in the docker image...

And on stderr, "/bin/hadoop" does exists on the host but not in the image, as everyone is saying that you don't have to install hadoop nor spark inside it.

So, to launch a dumb spark job on YARN using Docker runtime, let's say to print a "hello world" from appMaster, what should be the canonical Docker image setup ?

What should be installed beforehand, which environment variables do I have to set, and do you know the exact command that uses YARN uses to instantiate my container ?

Cheers everybody,