Support Questions
Find answers, ask questions, and share your expertise

Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

New Contributor

I installed sandbox using Docker for Windows image.

Sandbox information:                                                                                                                        
Created on: 01_02_2018_10_47_41                                                                                                             
Hadoop stack version:  Hadoop 2.7.3.2.6.4.0-91                                                                                              
Ambari Version: 2.6.1.0-143                                                                                                                 
Ambari Hash: 2989989d67edacff7e9db702b4cf0c080556dddc                                                                                       
Ambari build:  Release : 143                                                                                                                
Java version:  1.8.0_161

OS Version:  CentOS release 6.9 (Final)                                                                                                        

In Zeppelin, when I attempt to use spark2 interpreter with pyspark

%pyspark

x = 1

(run)

I get the following error:

/usr/hdp/current/spark2-client/python/pyspark/context.py:205: UserWarning: Support for Python 2.6 is deprecated as of Spark 2.0.0 warnings.warn("Support for Python 2.6 is deprecated as of Spark 2.0.0") Traceback (most recent call last): File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 302, in <module> __zeppelin__._setup_matplotlib() File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 141, in _setup_matplotlib import backend_zinline File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/backend_zinline.py", line 30, in <module> import mpl_config File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 99, in <module> _init_config() File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 83, in _init_config fmt = matplotlib.rcParams['savefig.format'] KeyError: 'savefig.format

3 REPLIES 3

@Patrick Young Can you check your python version on the cluster nodes? Is it 2.6 by any chance?

New Contributor

In hdp shell if I type python --version it is 2.6.6

I am confused why the sandbox image would use a deprecated version of python. If it needs python3 to work properly, then python3 should be installed by default when the Virtual Machine is first created.

I tried installing python3 in the Centos OS and restarting the hdp services but without success.

Please could you give me step by step instructions to go from a fresh install on VMWare Workstation with the error detailed above to a state where I am able to use Zeppelin with spark2 and pyspark. It would be much appreciated. Thank you.

Hi @Patrick Young

You need to follow many steps to make this works.

About python:

- I installed anaconda3 and the critical step is do not let anaconda3 to be configured in the environment variables. HDP platform needs python 2 for some scripts and the python path needs to be resolved to a python 2 installation.

Since I want to have spark and spark2 interpreters I commented the SPARK_HOME line in the zeppelin-env.sh file:

64619-spark-home.png

Another configuration I changed in this file:

According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:

exportZEPPELIN_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"

export ZEPPELIN_INTP_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"

- You need to configure the spark2 interpreter as follow:

64621-spark2-interpreter.jpg

64620-spark-int-python.jpg

I also created a Python interpreter:

64622-python-interpreter.png

Finally I created a symbolic link to be able to find conda

Create symlink to /bin/conda: ln -s /opt/anaconda3/bin/conda /bin/conda

Of course you have to adjust the paths above to your paths.

Hope that helps.

Kind regards, Paul

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.