Support Questions

Find answers, ask questions, and share your expertise

Usage of Python 2.7 version in Pyspark

Rising Star

As HDP comes with Python 2.6, but for spark jobs would like to use python 2.7 version.

What all changes do we need to set to make only spark pick the installed 2.7 version. Thx

1 ACCEPTED SOLUTION

@nyadav

You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.

export PYSPARK_PYTHON=/usr/local/bin/python2.7 
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"

View solution in original post

3 REPLIES 3

@nyadav

You need to add below options to your spark-env.sh.You should be able to run the pyspark jobs on 2.7. Let me know if you face any issue.

export PYSPARK_PYTHON=/usr/local/bin/python2.7 
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7 
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=/usr/local/bin/python2.7"

Rising Star

@Sandeep Nemuri, thanks for your reply. Does it has to be done on all the nodes, or from where I'm launching the spark jobs?

@nyadav

If you run jobs in yarn-cluster mode then the above 3 should be set in all the nodes, you can add these through ambari so that it will push the changes to all the nodes.