Trying to use Zeppelin pyspark interpreter with python3, I set "python" parameter in the interpreter to my python3 path, and have installed python3 on all worker nodes in the cluster at the same path, getting error when running simple commands:
%pyspark file = sc.textFile("/data/x1") file.take(3) Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions
It works from the command line, using "pyspark" after exporting PYSPARK_PYTHON set to my python3 path. But how to tell this to Zeppelin? I haven't changed anything else. Actually, as the next step I'd like to create 2 spark interpreters, one to run on python2 and another on python3.
I tried with following settings.
1) Install python3.5 on all my cluster nodes (I have a centos7 based cluster, and I used these instructions : https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programm...
[root@ctr-XXXX ~]# which python3.5 /usr/bin/python3.5 [root@ctr-XXXX~]# python3.5 --version Python 3.5.3
2) In zeppelin-env.sh
I added this property
export PYSPARK_PYTHON = /usr/bin/python3.5
3) Modified my zeppelin spark interpreter from GUI
After that If I run following paragraph, it prints python 3.5.3 as its current version
Thanks for your reply, but your solution will fix all Zeppelin interpreters to use py3. I want to have interpreters running both py2 and py3. I was able to set livy.pyspark to work on py3, and I'm looking for setup to enable spark.pyspark interpreter to work on py3.
Hi @Andrey Ne
The following solution worked for me. I added these two properties on my customized %spark2py3 interpreter.