RUN apt-get update && apt-get install -y openjdk-8-jdk python-pip
RUN ln -s /usr/lib/jvm/java-1.8.0-openjdk-amd64 /usr/lib/jvm/java
RUN pip install -U pip
RUN pip install pyspark==2.3.2 numpy pandas
Although I am setting all the configurations regarding to spark.yarn.AppMasterEnv.*, the driver cannot find any of my dependencies. However, if I install the dependencies locally in my master node and I submit the app with "--deploy-mode client" (regular users are not allowed to do this, they have to submit the job from their JupyterHub environments), it works, so it seems that when I set "--deploy-mode cluster" the Driver/AM is executing directly on nodes instead of executing at Docker.
I don't know if this is the expected behavior or if I'm missing something.