Zeppelin is A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
The latest version of zeppelin is 0.6.2 when this article is written, although the community has made lots of effort to improve it. Sometimes you would still meet some weird issues due to environment issue, wrong configuration or zeppelin’s bug of itself. This article would try to illustrate to you to how to diagnose zeppelin if you meet some issues that you can't figure out what’s wrong.
Before I go to details I’d like to give an illustration of zeppelin’s architecture. So that we can understand where to diagnose.
The above is a diagram of zeppelin’s diagram. Overall it has 3 layers:
I would not talk about the details, but just want you to have an overall picture of what components zeppelin has, and usually we hit issues on zeppelin server and interpreter process. Next I will talk about them one by one.
Diagnose Zeppelin Server
The most efficient tool to diagnose one software/library is log, log and log again. Usually you can figure out what’s wrong in log. Zeppelin Server’ log is in folder $ZEPPELIN_LOG_DIR, it is in /var/log/zeppelin for HDP and it is $ZEPPLEIN_HOME/logs if you use apache zeppelin distribution and doesn’t set ZEPPELIN_LOG_DIR. The log file name is zeppelin-<user>-<host>.log, there’s other files under the log dir, I will talk about them in the next section. Zeppelin use log4j and its default log level is INFO. log4j.properties is located in /etc/zeppelin/conf for HDP, and in $ZEPPELIN_HOME/conf for apache zeppelin distribution if you didn’t specify ZEPPELIN_CONF_DIR. You can update log4j.properties to change log level. First change log4j.appender.dailyfile.Threshold to DEBUG, then add package level log setting. Here's my log4j.properties for your reference
According my experience, most of problems happen on the interpreter process side. There’s 2 kinds of scenario.
Interpreter process fail to launch.
Interpreter process can launch but fail to run paragraph.
Zeppelin would launch interpreter process by calling interpreter.sh which is located in $ZEPPELIN_HOME/bin. And each interpreter process has one log file located in $ZEPPELIN_LOG_DIR I mentioned before. The log file pattern is zeppelin-interpreter-<interpreter_name>-<user>-<host>.log. Interpreter process share the same log4j.properties with zeppelin-server, so you can change log configuration as I mentioned above. Usually you can check the interpreter log file to figure out what’s wrong. But sometimes there’s no such log file, usually this is because interpeter.sh fail to launch the interpreter process. For this case, you need to modify log4j.properties as above (change log4j.appender.dailyfile.Threshold to DEBUG and chage the log level of log4j.logger.org.apache.zeppelin.interpreter.remote to DEBUG).
And it is very useful to add the following line to zeppelin-env.sh so that you can see the spark submit command in log.
The following is the output in my machine, you can see the spark-submit command, which configuration we use and what classpath we use. Usually you can get all the context to figure out what’s wrong.
Sometimes logs may still not be sufficient for you, then you need to debug the zeppelin server process and interpreter process. In that case, you need to configure the following enviroment variable in zeppelin-env.sh