Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Rising Star

419-screenshot-2015-10-27-115433.png

Introduction

Apache Zeppelin (Incubator at the time of writing this post) is one of my favourite tools that I try to position and present to anyone interested in Analytics, Its 100% open source with an intelligent international team behind it in Korea (NFLABS) (Moving to San Francisco soon), its mainly based on interpreter concept that allows any language/data-processing-backend to be plugged into Apache Zeppelin.

Very similar to IPython/Jupyter except that the UI is probably more appealing and the amount of interpreters supported are richer, at the time of writing this Blog Zeppelin supported:

with this rich set of interpreters provided, it makes on boarding platforms like Apache Hadoop or Data Lake concepts much easier where data is sitting and consolidated somewhere and different organizational units with different skill sets needs to access the data and perform their day to day duties on it as data discovery, queries, data modelling, data streaming and finally Data Science using Apache Spark.

Apache Zeppelin Overview

With the notebook style editor and the ability to save notebooks on the fly, you can end up with some really cool notebooks, whether you are a data engineer, data scientist or a BI specialist.

Zeppelin Notebook Example
Dataset showing the Health Expenditure of the Australian Government over time by state.

Zeppelin also got a basic clean visualization views integrated with it, it also gives you control over what do you want to include in your graph by dragging and dropping fields in your visualization as below:

Zeppelin Drag and Drop
The sum of government budget healthcare expenditure in Australia by State

Also when you are done with your awesome notebook story, you can easily create a report out of it and either print it or send it out.

Car Accidents Fatalities in Melbourne
Car Accident Fatalities related to Alcohol driving , showing the most fatal days on the streets and the most fatal car accident types during Alcohol times

Playing with Zeppelin

If you have never played with Zeppelin before then visit this link for a quick way to start working it out using the latest Hortonworks tutorial we are including Zeppelin as part of HDP as a technical preview, which may supporting it officially may follow, check it out Here try out the different interpreters and how it interacts with Hadoop.

Zeppelin Hub

I was recently given access to the beta version of Hub, Hub is supposed to make life in organizations easier when it comes to sharing notebooks between different departments or pepole within the organization.

Lets assume an Organization got Marketing, BI and Data Science practices, the three departments overlaps with each other when it comes to the datasets being used, therfore there is no need anymore for each department to work completely isolated from the others, as they can share their experience together, brag about their notebooks, work together on the same notebook when trying to work on either complicated notebook or different skills are required.

Zeppelin Hub UI
Zeppelin Hub UI

Lets have a deeper look at Hub...

Hub Instances

Instance is backed by a Zeppelin installation somewhere (server,laptop,hadoop..etc), every time you create a new Instance a new Token is generated, this token should be added in your local Zeppelin installation under folder /incubator_zeppelin/conf/zeppelin-env.sh e.g.

export ZEPPELINHUB_API_TOKEN="f41d1a2b-98f8-XXXX-2575b9b189"

Once the token is added, you will be able to see the notebooks online whenever you connect to Hub (http://zeppelin.hub.com).

Hub Spaces

once an instance is added, you will be able to see all the notebook for each instance, and since every space is actually either a dept. or a category of notebooks that needs to be shared across certain people, you can easily drag and drop notebooks into spaces making them shared across this specific space.

Adding a Notebook to a Space
Adding a Notebook to a Space
Showing a Notebook inside Zeppelin Hub
Showing a Notebook inside Zeppelin Hub

Very cool !

Since its beta, there is still much of work to be done like executing notebooks from Hub directly, resizing and formatting and some other minor issues, I am sure the All Stars team @nflabs will make it happen very soon as they always did.

if you are interested in playing with Beta, you may request access on Apache Zeppelin website here

Hortonworks and Apache Zeppelin

Hortonworks is heavily adopting Apache Zeppelin, that showed in the contribution they have made into the product and into Apache Ambari, @ali one of Rockstars at Hortonworks created an Apache Zeppelin View on Ambari, which gives Zeppelin authentication and allows users to have a single pane of glass when it comes to uploading datasets using HDFS view on Apache Ambari Views and other operational needs.

Apache Ambari with Zeppelin View Integration
Apache Ambari with Zeppelin View Integration
Screenshot 2015-10-27 11.25.15
Apache Zeppelin Notebook editor from Apache Ambari

If you want to integrate Zeppelin in Ambari with Apache Spark as well, just easily follow the steps on this link

Hortonworks Gallery for Apache Zeppelin

Recently we have published a Gallery where anyone can contribute and add their notebooks publicly in order to share their notebooks, all what you need to do is to grab the notebook folder and upload check it out here

If you are not sure how to start, a great way is to take a look at Hortonworks Gallery for Apache Zeppelin, you will be able to have a 360 view on different ways of creating different notebooks

Helium

Project Helium is a revolutionary change in Zeppelin, Helium allows you to integrate almost any standard html, css, javascript as a visualization or a view inside Zeppelin.

Helium Application would consists of an View, Algortihm and an Access to the resource, you can get more information of Helium here

10,122 Views
Comments
avatar
Master Mentor

@Ned Shawa Nice article! good candidate for official blog

avatar

Worth mentioning our Zeppelin Gallery:

https://github.com/hortonworks-gallery/zeppelin-notebooks

avatar

@Ned Shawa I got access to the Zeppelin Hub today, can I use the zeppelin part of the HDP 2.3.2 sandbox or do i need to install from apache zeppelin github?

avatar
Rising Star
@azeltov@hortonworks.com

you can, as long as you modify the ZEPPELIN HUB API TOKEN and you have a direct internet connection from the Sandbox

avatar

Sweet will try it soon.

avatar

Got it syncing to the hub! So if i understand this correct, now if I want to sync these notebooks to another zeppelin, i just put in the same "hub_api_token" in that zeppelin and will it sync to that zeppelin instance? Or is that a feature that's not developed yet?