Support Questions

Find answers, ask questions, and share your expertise

Version of Python of Pyspark for Spark2 and Zeppelin

avatar
Explorer

Hi.

I built a cluster with HDP ambari Version 2.6.1.5 and I am using anaconda3 as my python interpreter.

I have a problem of changing or alter python version for Spark2 pyspark in zeppelin

When I check python version of Spark2 by pyspark, it shows as bellow which means OK to me.

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0.2.6.4.0-91
      /_/

Using Python version 3.6.4 (default, Jan 16 2018 18:10:19)
SparkSession available as 'spark'.
>>> import sys
>>> print (sys.path)
['', '/tmp/spark-14a0fb52-5fea-4c1f-bf6b-c0bd0c37eedf/userFiles-54205d05-fbf0-4ec1-b274-4c5a2b78e840', '/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip', '/usr/hdp/current/spark2-client/python', '/root', '/root/anaconda3/lib/python36.zip', '/root/anaconda3/lib/python3.6', '/root/anaconda3/lib/python3.6/lib-dynload', '/root/anaconda3/lib/python3.6/site-packages']
>>> print (sys.version)
3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0]
>>> exit()

When I check python version of Spark2 by zeppelin, it shows different results as below.

%spark2.pyspark
print(sc.version)
import sys
print(sys.version)
print()
print(sys.path)

2.2.0.2.6.4.0-91
2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
()
['/tmp', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af/py4j-0.10.4-src.zip', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af/pyspark.zip', u'/tmp/spark-75f5d1d5-fefa-4dc8-bc9b-c797dec106d7/userFiles-1c25cf01-7758-49dd-a1eb-f1fbd084e9af', '/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip', '/usr/hdp/current/spark2-client/python', '/usr/hdp/current/spark2-client/python/lib/py4j-0.8.2.1-src.zip', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib/python2.7/site-packages']

I have tried to update zeppelin interpreter setting known by other questions and answers such as

export PYSPARK_PYTHON = /root/anaconda3/bin/python

I updated both zeppelin.env.sh and interpreter setting via zeppelin GUI but it didn't work.

I think it cause because zeppelin's python path is heading /usr/lib64/python2.7 which is base for centos but I don't know how to fix it.

If there is any idea of this problem, please let me know. Any of your advice would be appreciated.

Thank you.

1 ACCEPTED SOLUTION

avatar

@Sungwoo Park

Try installing anaconda3 on /opt/anaconda3 instead of under /root. And add the following configuration to your interpreter:

76467-screen-shot-2018-05-29-at-100214-am.png

The results while having this configuration is:

76468-screen-shot-2018-05-29-at-100156-am.png

Important: Since zeppelin runs spark2 interpreter in yarn-client mode by default you need to make sure the /root/anaconda3/bin/python3 is installed on the zeppelin machine and on all cluster worker nodes.

Additional resources

https://community.hortonworks.com/content/supportkb/146508/how-to-use-alternate-python-version-for-s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

6 REPLIES 6

avatar
Rising Star

Hi @Sungwoo Park,

You can have a look at this question. I think it would help you : https://stackoverflow.com/questions/47198678/zeppelin-python-conda-and-python-sql-interpreters-do-no...

Best regards,

Paul

avatar
Explorer

Hi @Paul Hernandez

Thank you for your comment.

I checked the post you told me and found it is not a good idea: changing symlink in bin/.

It might make trouble to linux system.

avatar
Rising Star

Hi @Sungwoo Park, thanks for the input. Could you please elaborate a little bit more, why could the symlink cause problems, and which ones?

I am very interesting since we have this settings in a demo cluster within a customer.

BR. Paul

avatar
Explorer

Hi @Paul Hernandez

First of all, my problem has solved by adding zeppelin properties like @Felix Albani show me.

In my case, my cluster is based on CentOS 7.

The OS has python 2.7 as default and some packages such as yum have dependency on the default python. The symlink '/bin/python' is heading this default python and if it is changed, yum is not working any more.

Hope this help.

SW

avatar

@Sungwoo Park

Try installing anaconda3 on /opt/anaconda3 instead of under /root. And add the following configuration to your interpreter:

76467-screen-shot-2018-05-29-at-100214-am.png

The results while having this configuration is:

76468-screen-shot-2018-05-29-at-100156-am.png

Important: Since zeppelin runs spark2 interpreter in yarn-client mode by default you need to make sure the /root/anaconda3/bin/python3 is installed on the zeppelin machine and on all cluster worker nodes.

Additional resources

https://community.hortonworks.com/content/supportkb/146508/how-to-use-alternate-python-version-for-s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Explorer

@Felix Albani Hi felix, you installed 3.6.4, but according to the document spark2 can only support up to 3.4.x, Can you kindly explain how does this work ?