Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Error running Zeppelin pyspark interpreter with Python3

Trying to use Zeppelin pyspark interpreter with python3, I set "python" parameter in the interpreter to my python3 path, and have installed python3 on all worker nodes in the cluster at the same path, getting error when running simple commands:

%pyspark
file = sc.textFile("/data/x1")
file.take(3)
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions

It works from the command line, using "pyspark" after exporting PYSPARK_PYTHON set to my python3 path. But how to tell this to Zeppelin? I haven't changed anything else. Actually, as the next step I'd like to create 2 spark interpreters, one to run on python2 and another on python3.

5 REPLIES 5

@Predrag Minovic

I tried with following settings.

1) Install python3.5 on all my cluster nodes (I have a centos7 based cluster, and I used these instructions : https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programm...

[root@ctr-XXXX ~]# which python3.5
/usr/bin/python3.5
[root@ctr-XXXX~]# python3.5 --version
Python 3.5.3

2) In zeppelin-env.sh

I added this property

export PYSPARK_PYTHON = /usr/bin/python3.5

3) Modified my zeppelin spark interpreter from GUI

14456-screen-shot-2017-04-06-at-44625-pm.png

After that If I run following paragraph, it prints python 3.5.3 as its current version

14457-screen-shot-2017-04-06-at-44743-pm.png

@Predrag Minovic Can you please try above steps and accept the answer if it works for you? Thanks !!

Thanks for your reply, but your solution will fix all Zeppelin interpreters to use py3. I want to have interpreters running both py2 and py3. I was able to set livy.pyspark to work on py3, and I'm looking for setup to enable spark.pyspark interpreter to work on py3.

New Contributor

Do u find this cind of solution? I need it to!

Cloudera Employee

Hi @Andrey Ne

The following solution worked for me. I added these two properties on my customized %spark2py3 interpreter.

PYSPARK_DRIVER_PYTHON /usr/local/anaconda3/bin/python3
PYSPARK_PYTHON /usr/local/anaconda3/bin/python3

71393-screen-shot-2018-04-23-at-30612-pm.png