Community Articles

vshukla · ‎10-13-2017

Zeppelin Best Practices

Install & Versions
1. Leverage Ambari to install Zeppelin and always use the latest version of Zeppelin. With HDP 2.6.2, Zeppelin 0.7.2 is available and it contains many useful stability & security fixes that will improve your experience. Zeppelin in HDP 2.5.x has many known issues that were resolved in 2.6.2
Deployment Choices
1. While you can select any node type to install Zeppelin, the best place is a gateway node. The reason gateway node makes most sense is when the cluster is firewalled off and protected from outside, users can still see the gateway node.
2. Hardware Requirement
1. More memory & more Cores are better
2. Memory: Minimum of 64 GB node
3. Cores: Minimum of 8 cores
4. # of users: A given Zeppelin node can support 8-10 users. If you want more users, you can set up multiple Zeppelin instances. More details in MT section.

Security: Like any software, the security depends on threat matrix and deployment choices. This section assumes a MT Zeppelin deployment.

Authentication

Kerberize HDP Cluster using Ambari
Configure Zeppelin to leverage corporate LDAP for authentication
Don’t use Zeppelin’s local user based authentication, except for demo setup.

Authorization

Limit end-users access to configure interpreter. Interpreter configuration is shared and only admins should have the access to configure interpreter. Leverage Zeppelin’s shiro configuration to achieve this.
With Livy interpreter Spark jobs are sent under end-user identity to HDP cluster. All Ranger based policy controls apply.
With JDBC interpreter Hive & Spark access is done under end-user identity. All Ranger based policy controls apply.

Passwords:

Leverage Zeppelin’s support for hiding password in Hadoop credential for LDAP and JDBC password. Don’t put password in clear in shiro.ini

Multi - Tenancy & HA

In a MT environment, only allow admin role access to interpreter configuration
A given Zeppelin instance should support only < 10 users. To support more users, setup multiple Zeppelin instance and put a HTTP proxy like NGinx with sticky sessions to route same user to same Zeppelin instance. Sticky sessions are needed since Zeppelin stored notebook under a given Zeppelin instance dir. If you use a networks storage system, the Zeppelin notebook directory can be stored on the network storage and in that case sticky sessions are not needed. With upcoming HDP 2.6.3, Zeppelin will store notebooks in HDFS and this requirement will not be necessary.

Interpreters

Leverage Livy interpreter for Spark jobs against HDP cluster. Don’t use Spark interpreter since it does not provide ideal identity propagation.
Avoid using Shell interpreter, since the security isolation isn’t ideal.
Don’t use the interpreter UI for impersonation. It works for Livy & JDBC (Hive) and for all others we don’t officially support i
Users should restart their own interpreter session from the notebook page button instead of the interpreter page which would restart sessions for all users
Livy interpreter
JDBC interpreter

Also See Jianfeng Zhang's Zeppelin Best Practices notebook

KenTabuchi · ‎02-14-2022

Hi @vshukla,

Thank you for the article. Could you help us with the insights around the Deployment Choices reasons, please? My customer wants to know why to justify deploying Memory: Minimum of 64 GB node and Cores: Minimum of 8 cores, especially.

Thank you!

Cloudera Community

Community Articles

Zeppelin Best Practices

Apache Zeppelin

Zeppelin Best Practices

Re: Zeppelin Best Practices

Tips and best practices for optimizing Hive perfor...

Kafka Best Practices

ORC Creation Best Practices

Kafka Mirror Maker Best Practices

Kafka 0.9 Configuration Best Practices

NiFi Sizing Guide & Deployment Best Practices

Understanding Solr Architecture and Best practices

Solr Best Practices

HBase client application best practices

Unofficial Storm and Kafka Best Practices Guide