Created 12-29-2016 11:43 AM
As HDP comes with Python 2.6, but for spark jobs would like to use python 2.7 version.
What all changes do we need to set to make only spark pick the installed 2.7 version. Thx
Created 12-29-2016 11:46 AM
You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.
export PYSPARK_PYTHON=/usr/local/bin/python2.7 export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"
Created 12-29-2016 11:46 AM
You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.
export PYSPARK_PYTHON=/usr/local/bin/python2.7 export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"
Created 12-29-2016 12:06 PM
@Sandeep Nemuri, thanks for your reply. Does it has to be done on all the nodes, or from where I'm launching the spark jobs?
Created 12-29-2016 12:13 PM
If you run jobs in yarn-cluster mode then the above 3 should be set in all the nodes, you can add these through ambari so that it will push the changes to all the nodes.