Created 08-01-2022 02:35 AM
Hello all,
I am trying to run spark job using spark-submit with a docker image over yarn.
I followed the instructions in the Blog provided by cloudera in the following link:
and I ran into an error that I couldn't fine an answer to.
Note: I already did all the configurations required from the post.
I ran this command :
spark-submit \
--master yarn \
--deploy-mode cluster \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=faresdev8/python3:v5 \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/data1/opt/cloudera/parcels/:/data1/opt/cloudera/parcels/:ro" \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=faresdev8/python3:v5 \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro,/opt/cloudera/parcels/:/opt/cloudera/parcels/:ro,/data1/opt/cloudera/parcels/:/data1/opt/cloudera/parcels/:ro" \
ols.py
And this is the error I get:
Sometimes its gives me an exit code 29. I don't understand what the problem is especially that I followed instructions properly.
Created 08-01-2022 06:10 AM
Okay so I solved this problem.
if anyone got something related check on these 3 things:
1- spark version does not mismatch the python version. Spark 2 doesn't support python higher than 3.7.
2- Make sure that your python code starts a spark session, I forgot that I removed that when I was experimenting.
3- Make sure there are no problems in the code it self and test it on another machine to check if it works properly
Created 08-01-2022 06:10 AM
Okay so I solved this problem.
if anyone got something related check on these 3 things:
1- spark version does not mismatch the python version. Spark 2 doesn't support python higher than 3.7.
2- Make sure that your python code starts a spark session, I forgot that I removed that when I was experimenting.
3- Make sure there are no problems in the code it self and test it on another machine to check if it works properly