Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

multiple versions of python issues

avatar
Contributor

I have two versions on python installed (2.6 and 2.7) Spark jobs run thru shell in pyspark are picking up one version of Python (2.7). Jobs submitted to the cluster via yarn are picking up the 2.6 version of python. How can I get yarn jobs to point to the 2.7 version?

1 ACCEPTED SOLUTION

avatar

@Jon Page Try these before running spark-submit command:

export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python

export PYSPARK_PYTHON=/opt/anaconda2/bin/python

/opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes)

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

2 REPLIES 2

avatar

@Jon Page Try these before running spark-submit command:

export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python

export PYSPARK_PYTHON=/opt/anaconda2/bin/python

/opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes)

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Contributor

Thanks, this did work for me!

Is there a way to configure the hadoop cluster to use a specific installed version of python?