Support Questions
Find answers, ask questions, and share your expertise

Understanding Zeppelin interpreter architecture

Understanding Zeppelin interpreter architecture

Super Collaborator

Hi,

Just after a restart of the Zeppelin Server I ran a simple %spark note. At the time I was the only user on the Zeppelin Server.

When I take a look at the footprint of this interaction in the local process list I get 3 additional processes (the one on top "12268 is the Zeppelin server itself!) :

#> ps -u zeppelin -f --forest

UID PID PPID C STIME TTY TIME CMD
zeppelin 12268 1 0 11:33 ? 00:00:35 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memor (cut off ...)

zeppelin 26818 12268 0 12:53 ? 00:00:00 \_ /bin/bash /usr/hdp/current/zeppelin-server/bin/interpreter.sh -d /usr/hdp/current/zeppelin-server/interpreter/spar (cut off ...)

zeppelin 26830 26818 0 12:53 ? 00:00:00 \_ /bin/bash /usr/hdp/current/zeppelin-server/bin/interpreter.sh -d /usr/hdp/current/zeppelin-server/interpreter/ (cut off ...)

zeppelin 26831 26830 6 12:53 ? 00:01:09 \_ /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/java -Dhdp.version=2.6.2.0-205 -cp /etc/z (cut off ...)

Although not very clear in the output above, the 4 processes have parent > child relationships as the following points out:

#> ps -u zeppelin --forest

PID TTY TIME CMD

12268 ? 00:00:35 java
26818 ? 00:00:00 \_ interpreter.sh
26830 ? 00:00:00   \_ interpreter.sh
26831 ? 00:01:09     \_ java

Pid 12268 is the Zeppelin server itself. The last pid 26831 is the local Spark instance launched on Yarn.

My actual question is about pid 26818 & 26830 which seem to be identical:

#> ps aux | grep 26818

zeppelin 26818 0.0 0.0 113128 1568 ? S 12:53 0:00 /bin/bash /usr/hdp/current/zeppelin-server/bin/interpreter.sh -d /usr/hdp/current/zeppelin-server/interpreter/spark -p 33433 -l /usr/hdp/current/zeppelin-server/local-repo/2CKX8WPU1 -g spark


#> ps aux | grep 26830

zeppelin 26830 0.0 0.0 113124 636 ? S 12:53 0:00 /bin/bash /usr/hdp/current/zeppelin-server/bin/interpreter.sh -d /usr/hdp/current/zeppelin-server/interpreter/spark -p 33433 -l /usr/hdp/current/zeppelin-server/local-repo/2CKX8WPU1 -g spark

Pid 26830 is related to the pid registered at /var/run/zeppelin/zeppelin-interpreter-spark-zeppelin-<hostname>.pid. So I get that an interpreter instance is lauched when the %spark note is fired from the UI.

But what is the meaning and function of that identical and intermediary pid 26818 ?

Is it something on my environment only or is this by design?