Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue with spark-submit with different python version

avatar

We are doing spark-submit from airflow (added it as a custom parcel into CDP 7.1)

Airflow is built with python 3 however default python version on CDP is python2. As a result during spark-submit getting this issue:

 

WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.228.86.42
ExitCodeException exitCode=1:   File "/opt/cloudera/parcels/Airflow-1.10.10-python3.7.7_1.2.3/lib/python3.7/site.py", line 177
    file=sys.stderr)
        ^
SyntaxError: invalid syntax

 

Added PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON to spark-defaults as well as spark-env.sh pointing to python3. Also added spark.yarn.appMasterEnv.PYTHONHASHSEED = 0 however the problem remains. As soon as python version is being changed to python3 on the workers (basically the only available python becomes python 3) spark-submit starts working. I was wondering if there is something I am missing.

 

Thanks

1 ACCEPTED SOLUTION

avatar

Problem solved: the issue was related to topology.py which used python as a default interpreter which despite all env vars that are pointing to python3 was still resolved to python 2 so ended up overriding topology with path to python3 

View solution in original post

1 REPLY 1

avatar

Problem solved: the issue was related to topology.py which used python as a default interpreter which despite all env vars that are pointing to python3 was still resolved to python 2 so ended up overriding topology with path to python3