Created 03-14-2018 06:05 PM
I installed sandbox using Docker for Windows image.
Sandbox information: Created on: 01_02_2018_10_47_41 Hadoop stack version: Hadoop 2.7.3.2.6.4.0-91 Ambari Version: 2.6.1.0-143 Ambari Hash: 2989989d67edacff7e9db702b4cf0c080556dddc Ambari build: Release : 143 Java version: 1.8.0_161 OS Version: CentOS release 6.9 (Final)
In Zeppelin, when I attempt to use spark2 interpreter with pyspark
%pyspark
x = 1
(run)
I get the following error:
/usr/hdp/current/spark2-client/python/pyspark/context.py:205: UserWarning: Support for Python 2.6 is deprecated as of Spark 2.0.0 warnings.warn("Support for Python 2.6 is deprecated as of Spark 2.0.0") Traceback (most recent call last): File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 302, in <module> __zeppelin__._setup_matplotlib() File "/tmp/zeppelin_pyspark-3334147833533366579.py", line 141, in _setup_matplotlib import backend_zinline File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/backend_zinline.py", line 30, in <module> import mpl_config File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 99, in <module> _init_config() File "/usr/hdp/current/zeppelin-server/interpreter/lib/python/mpl_config.py", line 83, in _init_config fmt = matplotlib.rcParams['savefig.format'] KeyError: 'savefig.format
Created 03-14-2018 06:38 PM
@Patrick Young Can you check your python version on the cluster nodes? Is it 2.6 by any chance?
Created 03-14-2018 07:54 PM
In hdp shell if I type python --version it is 2.6.6
I am confused why the sandbox image would use a deprecated version of python. If it needs python3 to work properly, then python3 should be installed by default when the Virtual Machine is first created.
I tried installing python3 in the Centos OS and restarting the hdp services but without success.
Please could you give me step by step instructions to go from a fresh install on VMWare Workstation with the error detailed above to a state where I am able to use Zeppelin with spark2 and pyspark. It would be much appreciated. Thank you.
Created on 03-15-2018 09:02 AM - edited 08-18-2019 02:24 AM
You need to follow many steps to make this works.
About python:
- I installed anaconda3 and the critical step is do not let anaconda3 to be configured in the environment variables. HDP platform needs python 2 for some scripts and the python path needs to be resolved to a python 2 installation.
Since I want to have spark and spark2 interpreters I commented the SPARK_HOME line in the zeppelin-env.sh file:
Another configuration I changed in this file:
According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:
exportZEPPELIN_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"
export ZEPPELIN_INTP_JAVA_OPTS="-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default"
- You need to configure the spark2 interpreter as follow:
I also created a Python interpreter:
Finally I created a symbolic link to be able to find conda
Create symlink to /bin/conda: ln -s /opt/anaconda3/bin/conda /bin/conda
Of course you have to adjust the paths above to your paths.
Hope that helps.
Kind regards, Paul