Support Questions

Find answers, ask questions, and share your expertise

Usage of Python 2.7 version in Pyspark

avatar
Expert Contributor

As HDP comes with Python 2.6, but for spark jobs would like to use python 2.7 version.

What all changes do we need to set to make only spark pick the installed 2.7 version. Thx

1 ACCEPTED SOLUTION

avatar
@nyadav

You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.

export PYSPARK_PYTHON=/usr/local/bin/python2.7 
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"

View solution in original post

3 REPLIES 3

avatar
@nyadav

You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.

export PYSPARK_PYTHON=/usr/local/bin/python2.7 
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"

avatar
Expert Contributor

@Sandeep Nemuri, thanks for your reply. Does it has to be done on all the nodes, or from where I'm launching the spark jobs?

avatar
@nyadav

If you run jobs in yarn-cluster mode then the above 3 should be set in all the nodes, you can add these through ambari so that it will push the changes to all the nodes.