Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Multiple versions of python on spark client

avatar

Hi,

Is it possible to use pyspark client in centos7 (python 2.7) with a yarn cluster HDP 2.5 in centos 6 (python 2.6) ?

Best Regards

Gerald

4 REPLIES 4

avatar
Super Collaborator
@Gerald BIDAULT

I guess this is not possible. If you have two different versions of spark then application will fail with exception "Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."

You can also refer this question : https://community.hortonworks.com/questions/101952/zeppelin-pyspark-cannot-run-with-different-minor-...

avatar

@ssharma,

Thank's for your answer. My problem is one client needs to use OrientDB and pyorient connector. The issue is that the version of pyorient isn't compatible with python 2.6, so we can't integrate pyorient in python prog.

Best Regards

Gérald

avatar
Super Collaborator

@Gerald BIDAULT

Is it feasible to install python2.7 on your centos6 cluster ?

If you can install python2.7, then modify spark-env.sh to use python2.7 by changing below properties :

export PYSPARK_PYTHON=<path to python 2.7>
export PYSPARK_DRIVER_PYTHON=python2.7

Steps for changing spark-env.sh :

1) Login to ambari

2) Navigate to spark service

3) Under 'Advanced spark2-env' modify 'content' to add properties as described above.

Attaching screenshot.spark-changes.png

avatar

@ssharma,

Thank's for your answer.

If i do it, what are the impacts from other HDP services using python6 ?

Best Regards

Gérald