We are doing spark-submit from airflow (added it as a custom parcel into CDP 7.1)
Airflow is built with python 3 however default python version on CDP is python2. As a result during spark-submit getting this issue:
WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.228.86.42
ExitCodeException exitCode=1: File "/opt/cloudera/parcels/Airflow-1.10.10-python3.7.7_1.2.3/lib/python3.7/site.py", line 177
file=sys.stderr)
^
SyntaxError: invalid syntax
Added PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON to spark-defaults as well as spark-env.sh pointing to python3. Also added spark.yarn.appMasterEnv.PYTHONHASHSEED = 0 however the problem remains. As soon as python version is being changed to python3 on the workers (basically the only available python becomes python 3) spark-submit starts working. I was wondering if there is something I am missing.
Thanks