Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

New Contributor

I installed sandbox using Docker for Windows image.

Sandbox information:                                                                                                                        
Created on: 01_02_2018_10_47_41                                                                                                             
Hadoop stack version:  Hadoop 2.7.3.2.6.4.0-91                                                                                              
Ambari Version: 2.6.1.0-143                                                                                                                 
Ambari Hash: 2989989d67edacff7e9db702b4cf0c080556dddc                                                                                       
Ambari build:  Release : 143                                                                                                                
Java version:  1.8.0_161

OS Version:  CentOS release 6.9 (Final)                                                                                                        

In Zeppelin, when I attempt to use spark2 interpreter with pyspark

%pyspark

x = 1

(run)

I get the following error:

/usr/hdp/current/spark2-client/python/pyspark/context.py:205: UserWarning: Support for Python 2.6 is deprecated as of Spark 2.0.0 warnings.warn("Support for Python 2.6 is deprecated as of Spark 2.0.0") Traceback (most recent call last): File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 302, in <module> __zeppelin__._setup_matplotlib() File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 141, in _setup_matplotlib import backend_zinline File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/backend_zinline.py", line 30, in <module> import mpl_config File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 99, in <module> _init_config() File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 83, in _init_config fmt = matplotlib.rcParams['savefig.format'] KeyError: 'savefig.format

3 REPLIES 3

Re: Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

@Patrick Young Can you check your python version on the cluster nodes? Is it 2.6 by any chance?

Re: Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

New Contributor

In hdp shell if I type python --version it is 2.6.6

I am confused why the sandbox image would use a deprecated version of python. If it needs python3 to work properly, then python3 should be installed by default when the Virtual Machine is first created.

I tried installing python3 in the Centos OS and restarting the hdp services but without success.

Please could you give me step by step instructions to go from a fresh install on VMWare Workstation with the error detailed above to a state where I am able to use Zeppelin with spark2 and pyspark. It would be much appreciated. Thank you.

Re: Error running spark2 interpreter in Zeppelin with pyspark due to matplotlib

New Contributor

Hi @Patrick Young

You need to follow many steps to make this works.

About python:

- I installed anaconda3 and the critical step is do not let anaconda3 to be configured in the environment variables. HDP platform needs python 2 for some scripts and the python path needs to be resolved to a python 2 installation.

Since I want to have spark and spark2 interpreters I commented the SPARK_HOME line in the zeppelin-env.sh file:

64619-spark-home.png

Another configuration I changed in this file:

According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:

exportZEPPELIN_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"

export ZEPPELIN_INTP_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"

- You need to configure the spark2 interpreter as follow:

64621-spark2-interpreter.jpg

64620-spark-int-python.jpg

I also created a Python interpreter:

64622-python-interpreter.png

Finally I created a symbolic link to be able to find conda

Create symlink to /bin/conda: ln -s /opt/anaconda3/bin/conda /bin/conda

Of course you have to adjust the paths above to your paths.

Hope that helps.

Kind regards, Paul

Don't have an account?
Coming from Hortonworks? Activate your account here