Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why is Python interpreter configuration for Zeppelin erased during restart?

Highlighted

Why is Python interpreter configuration for Zeppelin erased during restart?

New Contributor


### INITIAL STEPS ###


# (1) Ran the below command to install the "community managed" Python interpreter, as documented in:

# HDP Tutorials > Develop with Hadoop > Apache Spark > 3.Getting Started with Apache Zeppelin

# (https://hortonworks.com/tutorial/getting-started-with-apache-zeppelin/)


[root@sandbox-hdp ~]# /usr/hdp/3.0.1.0-187/zeppelin/bin/install-interpreter.sh -n python


# Output:


OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/zeppelin/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/zeppelin/lib/slf4j-simple-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/zeppelin/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Install python(org.apache.zeppelin:zeppelin-python:0.8.0) to /usr/hdp/3.0.1.0-187/zeppelin/interpreter/python ...

Interpreter python installed under /usr/hdp/3.0.1.0-187/zeppelin/interpreter/python.


1. Restart Zeppelin

2. Create interpreter setting in 'Interpreter' menu on Zeppelin GUI

3. Then you can bind the interpreter on your note



# (2) Followed the post-installation steps 1-3 successfully




### SYMPTOMS ###



# (1) Python interpreter disappears from the UI after restarting Zeppelin. Can still re-create it in the post-installation step #2, but only after the next restart.


# (2) All the %python interpreter statements fail with the following error, after which the interpreter crashes:


Traceback (most recent call last):

File "/tmp/zeppelin_python-2604294612768410068.py", line 20, in <module>

from py4j.java_gateway import java_import, JavaGateway, GatewayClient

ImportError: No module named py4j.java_gateway


### TROUBLESHOOTING ###


# Below is the list of the registered interpreters (after the above is executed) in the "zeppelin.interpreters" property value in /usr/hdp/current/zeppelin-server/conf/zeppelin-site.xml:

# (the instructions to check are taken from the "3rd party interpreters" section in https://zeppelin.apache.org/docs/0.6.0/manual/interpreterinstallation.html)

# (the values are comma-separated in the XML configuration file)


org.apache.zeppelin.spark.SparkInterpreter

org.apache.zeppelin.spark.PySparkInterpreter

org.apache.zeppelin.spark.SparkSqlInterpreter

org.apache.zeppelin.spark.DepInterpreter

org.apache.zeppelin.markdown.Markdown

org.apache.zeppelin.angular.AngularInterpreter

org.apache.zeppelin.shell.ShellInterpreter

org.apache.zeppelin.jdbc.JDBCInterpreter

org.apache.zeppelin.phoenix.PhoenixInterpreter

org.apache.zeppelin.livy.LivySparkInterpreter

org.apache.zeppelin.livy.LivyPySparkInterpreter

org.apache.zeppelin.livy.LivySparkRInterpreter

org.apache.zeppelin.livy.LivySparkSQLInterpreter



# As we can see, the Python interpreter classes are clearly missing.

# Apparently, the install-interpreter.sh script does not update the Zeppelin site configuration for the "community managed" interpreters.

# The official Apache Zeppelin installation documentation implies that it should, since this is specifically mentioned in the "3rd party interpreters" section only:

# "Once you have installed interpreters, you'll need to add interpreter class name into zeppelin.interpreters property in configuration."

# This may be the root cause of the issue.



# But what are the proper inerpreter class names for the Python interpreter?

# Surprisingly, it's not documented on zeppelin.apache.org.

# Here's the list of Java classes retrieved from https://github.com/apache/zeppelin/tree/master/python:


IPythonClient.java

IPythonInterpreter.java

PythonCondaInterpreter.java

PythonDockerInterpreter.java

PythonInterpreter.java

PythonInterpreterPandasSql.java

PythonUtils.java

PythonZeppelinContext.java


# Assuming that any class that extends "Interpreter" can be appended to the value list in the "zeppelin.interpreters" property value in /usr/hdp/current/zeppelin-server/conf/zeppelin-site.xml,

# appended the following list:


org.apache.zeppelin.python.IPythonInterpreter,org.apache.zeppelin.python.PythonCondaInterpreter,org.apache.zeppelin.python.PythonDockerInterpreter,org.apache.zeppelin.python.PythonInterpreter,org.apache.zeppelin.python.PythonInterpreterPandasSql


# Re-testing symptom (1):

# Created the Python interpreter in the UI and checked the following locations:

# Local path: /etc/zeppelin/conf/interpreter.json - MISSING

# HDFS path: /user/zeppelin/conf/interpreter.json - PRESENT

# Restarted the Zeppelin service


# FAILURE! The Python interpreter configuration is not persisted.

# Re-checked the following locations:

# Local path: /etc/zeppelin/conf/interpreter.json - MISSING

# HDFS path: /user/zeppelin/conf/interpreter.json - MISSING (!!!)


# So the additional configuration is not only not synced to the local file system from HDFS across restarts, but it is also wiped out from the HDFS version of the configuration file.




# Re-testing symptom (2):

# Created the Python interpreter in the UI

# Made a binding to the test note

# Tried to execute print("hello") again


# FAILURE! Still getting "ImportError: No module named py4j.java_gateway"

# Checked the local path for the added interpreter class names:

# /etc/zeppelin/conf/zeppelin-site.xml - MISSING





Don't have an account?
Coming from Hortonworks? Activate your account here