Created 04-27-2018 11:11 AM
Hi.
I built a cluster with HDP ambari Version 2.6.1.5 and I am using anaconda3 as my python interpreter.
I have a problem of changing or alter python version for Spark2 pyspark in zeppelin
When I check python version of Spark2 by pyspark, it shows as bellow which means OK to me.
____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.4.0-91 /_/ Using Python version 3.6.4 (default, Jan 16 2018 18:10:19) SparkSession available as 'spark'. >>> import sys >>> print (sys.path) ['', '/tmp/spark-14a0fb52-5fea-4c1f-bf6b-c0bd0c37eedf/userFiles-54205d05-fbf0-4ec1-b274-4c5a2b78e840', '/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip', '/usr/hdp/current/spark2-client/python', '/root', '/root/anaconda3/lib/python36.zip', '/root/anaconda3/lib/python3.6', '/root/anaconda3/lib/python3.6/lib-dynload', '/root/anaconda3/lib/python3.6/site-packages'] >>> print (sys.version) 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) [GCC 7.2.0] >>> exit()
When I check python version of Spark2 by zeppelin, it shows different results as below.
%spark2.pyspark print(sc.version) import sys print(sys.version) print() print(sys.path) 2.2.0.2.6.4.0-91 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] () ['/tmp', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af/py4j-0.10.4-src.zip', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af/pyspark.zip', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af', '/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip', '/usr/hdp/current/spark2-client/python', '/usr/hdp/current/spark2-client/python/lib/py4j-0.8.2.1-src.zip', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib/python2.7/site-packages']
I have tried to update zeppelin interpreter setting known by other questions and answers such as
export PYSPARK_PYTHON = /root/anaconda3/bin/python
I updated both zeppelin.env.sh and interpreter setting via zeppelin GUI but it didn't work.
I think it cause because zeppelin's python path is heading /usr/lib64/python2.7 which is base for centos but I don't know how to fix it.
If there is any idea of this problem, please let me know. Any of your advice would be appreciated.
Thank you.
Created on 05-29-2018 02:10 PM - edited 08-17-2019 06:22 PM
Try installing anaconda3 on /opt/anaconda3 instead of under /root. And add the following configuration to your interpreter:
The results while having this configuration is:
Important: Since zeppelin runs spark2 interpreter in yarn-client mode by default you need to make sure the /root/anaconda3/bin/python3 is installed on the zeppelin machine and on all cluster worker nodes.
Additional resources
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 04-28-2018 09:49 AM
Hi @Sungwoo Park,
You can have a look at this question. I think it would help you : https://stackoverflow.com/questions/47198678/zeppelin-python-conda-and-python-sql-interpreters-do-no...
Best regards,
Paul
Created 05-29-2018 06:11 AM
Thank you for your comment.
I checked the post you told me and found it is not a good idea: changing symlink in bin/.
It might make trouble to linux system.
Created 05-29-2018 04:02 PM
Hi @Sungwoo Park, thanks for the input. Could you please elaborate a little bit more, why could the symlink cause problems, and which ones?
I am very interesting since we have this settings in a demo cluster within a customer.
BR. Paul
Created 05-30-2018 01:34 AM
First of all, my problem has solved by adding zeppelin properties like @Felix Albani show me.
In my case, my cluster is based on CentOS 7.
The OS has python 2.7 as default and some packages such as yum have dependency on the default python. The symlink '/bin/python' is heading this default python and if it is changed, yum is not working any more.
Hope this help.
SW
Created on 05-29-2018 02:10 PM - edited 08-17-2019 06:22 PM
Try installing anaconda3 on /opt/anaconda3 instead of under /root. And add the following configuration to your interpreter:
The results while having this configuration is:
Important: Since zeppelin runs spark2 interpreter in yarn-client mode by default you need to make sure the /root/anaconda3/bin/python3 is installed on the zeppelin machine and on all cluster worker nodes.
Additional resources
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 05-31-2018 09:34 AM
@Felix Albani Hi felix, you installed 3.6.4, but according to the document spark2 can only support up to 3.4.x, Can you kindly explain how does this work ?