- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Usage of Python 2.7 version in Pyspark
- Labels:
-
Apache Spark
Created 12-29-2016 11:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As HDP comes with Python 2.6, but for spark jobs would like to use python 2.7 version.
What all changes do we need to set to make only spark pick the installed 2.7 version. Thx
Created 12-29-2016 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.
export PYSPARK_PYTHON=/usr/local/bin/python2.7 export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"
Created 12-29-2016 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.
export PYSPARK_PYTHON=/usr/local/bin/python2.7 export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"
Created 12-29-2016 12:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sandeep Nemuri, thanks for your reply. Does it has to be done on all the nodes, or from where I'm launching the spark jobs?
Created 12-29-2016 12:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you run jobs in yarn-cluster mode then the above 3 should be set in all the nodes, you can add these through ambari so that it will push the changes to all the nodes.
