Is it possible to use pyspark client in centos7 (python 2.7) with a yarn cluster HDP 2.5 in centos 6 (python 2.6) ?
I guess this is not possible. If you have two different versions of spark then application will fail with exception "Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
You can also refer this question : https://community.hortonworks.com/questions/101952/zeppelin-pyspark-cannot-run-with-different-minor-...
Thank's for your answer. My problem is one client needs to use OrientDB and pyorient connector. The issue is that the version of pyorient isn't compatible with python 2.6, so we can't integrate pyorient in python prog.
Is it feasible to install python2.7 on your centos6 cluster ?
If you can install python2.7, then modify spark-env.sh to use python2.7 by changing below properties :
export PYSPARK_PYTHON=<path to python 2.7> export PYSPARK_DRIVER_PYTHON=python2.7
Steps for changing spark-env.sh :
1) Login to ambari
2) Navigate to spark service
3) Under 'Advanced spark2-env' modify 'content' to add properties as described above.