- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Issue with spark-submit with different python version
- Labels:
-
Apache Spark
-
Apache YARN
Created ‎07-19-2020 09:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are doing spark-submit from airflow (added it as a custom parcel into CDP 7.1)
Airflow is built with python 3 however default python version on CDP is python2. As a result during spark-submit getting this issue:
WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.228.86.42
ExitCodeException exitCode=1: File "/opt/cloudera/parcels/Airflow-1.10.10-python3.7.7_1.2.3/lib/python3.7/site.py", line 177
file=sys.stderr)
^
SyntaxError: invalid syntax
Added PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON to spark-defaults as well as spark-env.sh pointing to python3. Also added spark.yarn.appMasterEnv.PYTHONHASHSEED = 0 however the problem remains. As soon as python version is being changed to python3 on the workers (basically the only available python becomes python 3) spark-submit starts working. I was wondering if there is something I am missing.
Thanks
Created on ‎07-22-2020 06:17 PM - edited ‎07-22-2020 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Problem solved: the issue was related to topology.py which used python as a default interpreter which despite all env vars that are pointing to python3 was still resolved to python 2 so ended up overriding topology with path to python3
Created on ‎07-22-2020 06:17 PM - edited ‎07-22-2020 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Problem solved: the issue was related to topology.py which used python as a default interpreter which despite all env vars that are pointing to python3 was still resolved to python 2 so ended up overriding topology with path to python3
