Created on 03-05-2018 09:06 PM - edited 08-18-2019 02:01 AM
Hello,
I'm using HDP sandbox 2.6.4, with Zeppelin Notebook installed.
When I want to use Pyspark on Zeppelin, it won't work...
Example :
%pyspark print "Test"
Out:
Traceback (most recent call last): File "/tmp/zeppelin_pyspark-8142801691187202169.py", line 302, in <module> __zeppelin__._setup_matplotlib() File "/tmp/zeppelin_pyspark-8142801691187202169.py", line 141, in _setup_matplotlib import backend_zinline File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/backend_zinline.py", line 30, in <module> import mpl_config File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 99, in <module> _init_config() File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 83, in _init_config fmt = matplotlib.rcParams['savefig.format'] KeyError: 'savefig.format'
I can't cancel the execution...
And on the Resource Manager UI, the job is running indefinitely : (See attached png file)
Thank you for your help
Created 03-05-2018 10:03 PM
According to this JIRA : https://issues.apache.org/jira/browse/ZEPPELIN-3094
The issue is the version of the package matplotlib. I've got this version : 0.99.1.1 but the minimum version required is 1.2.x
With pip I can't upgrade the version because of the version of python, that is 2.6.6 and so depreciated.
Created on 03-05-2018 10:41 PM - edited 08-18-2019 02:01 AM
Problem solved !
First : Install Python 2.7 using this tuto : https://tecadmin.net/install-python-2-7-on-centos-rhel/
Second : Install matplotlib with python2.7 : python2.7 pip install matplotlib
Third : Configuring the new version of Python as default for Spark in Zeppelin using this tuto : https://community.hortonworks.com/content/supportkb/146508/how-to-use-alternate-python-version-for-s...
Now It works !
Created 03-07-2018 04:03 PM
And after Python 2.7 installation don't forget to change Zeppelin Spark interpreter setting as:
zeppelin.pyspark.python | python2.7 |