Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

multiple versions of python issues

avatar
Contributor

I have two versions on python installed (2.6 and 2.7) Spark jobs run thru shell in pyspark are picking up one version of Python (2.7). Jobs submitted to the cluster via yarn are picking up the 2.6 version of python. How can I get yarn jobs to point to the 2.7 version?

1 ACCEPTED SOLUTION

avatar

@Jon Page Try these before running spark-submit command:

export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python

export PYSPARK_PYTHON=/opt/anaconda2/bin/python

/opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes)

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

2 REPLIES 2

avatar

@Jon Page Try these before running spark-submit command:

export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python

export PYSPARK_PYTHON=/opt/anaconda2/bin/python

/opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes)

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Thanks, this did work for me!

Is there a way to configure the hadoop cluster to use a specific installed version of python?