Created on 11-08-201510:56 PM - edited 08-17-201901:57 PM
Apache Zeppelin (Incubator at the time of writing this post) is one of my favourite tools that I try to position and present to anyone interested in Analytics, Its 100% open source with an intelligent international team behind it in Korea (NFLABS) (Moving to San Francisco soon), its mainly based on interpreter concept that allows any language/data-processing-backend to be plugged into Apache Zeppelin.
Very similar to IPython/Jupyter except that the UI is probably more appealing and the amount of interpreters supported are richer, at the time of writing this Blog Zeppelin supported:
with this rich set of interpreters provided, it makes on boarding platforms like Apache Hadoop or Data Lake concepts much easier where data is sitting and consolidated somewhere and different organizational units with different skill sets needs to access the data and perform their day to day duties on it as data discovery, queries, data modelling, data streaming and finally Data Science using Apache Spark.
Apache Zeppelin Overview
With the notebook style editor and the ability to save notebooks on the fly, you can end up with some really cool notebooks, whether you are a data engineer, data scientist or a BI specialist.
Dataset showing the Health Expenditure of the Australian Government over time by state.
Zeppelin also got a basic clean visualization views integrated with it, it also gives you control over what do you want to include in your graph by dragging and dropping fields in your visualization as below:
The sum of government budget healthcare expenditure in Australia by State
Also when you are done with your awesome notebook story, you can easily create a report out of it and either print it or send it out.
Car Accident Fatalities related to Alcohol driving , showing the most fatal days on the streets and the most fatal car accident types during Alcohol times
Playing with Zeppelin
If you have never played with Zeppelin before then visit this link for a quick way to start working it out using the latest Hortonworks tutorial we are including Zeppelin as part of HDP as a technical preview, which may supporting it officially may follow, check it out Here try out the different interpreters and how it interacts with Hadoop.
I was recently given access to the beta version of Hub, Hub is supposed to make life in organizations easier when it comes to sharing notebooks between different departments or pepole within the organization.
Lets assume an Organization got Marketing, BI and Data Science practices, the three departments overlaps with each other when it comes to the datasets being used, therfore there is no need anymore for each department to work completely isolated from the others, as they can share their experience together, brag about their notebooks, work together on the same notebook when trying to work on either complicated notebook or different skills are required.
Zeppelin Hub UI
Lets have a deeper look at Hub...
Instance is backed by a Zeppelin installation somewhere (server,laptop,hadoop..etc), every time you create a new Instance a new Token is generated, this token should be added in your local Zeppelin installation under folder /incubator_zeppelin/conf/zeppelin-env.sh e.g.
Once the token is added, you will be able to see the notebooks online whenever you connect to Hub (http://zeppelin.hub.com).
once an instance is added, you will be able to see all the notebook for each instance, and since every space is actually either a dept. or a category of notebooks that needs to be shared across certain people, you can easily drag and drop notebooks into spaces making them shared across this specific space.
Adding a Notebook to a Space
Showing a Notebook inside Zeppelin Hub
Very cool !
Since its beta, there is still much of work to be done like executing notebooks from Hub directly, resizing and formatting and some other minor issues, I am sure the All Stars team @nflabs will make it happen very soon as they always did.
if you are interested in playing with Beta, you may request access on Apache Zeppelin website here
Hortonworks and Apache Zeppelin
Hortonworks is heavily adopting Apache Zeppelin, that showed in the contribution they have made into the product and into Apache Ambari, @ali one of Rockstars at Hortonworks created an Apache Zeppelin View on Ambari, which gives Zeppelin authentication and allows users to have a single pane of glass when it comes to uploading datasets using HDFS view on Apache Ambari Views and other operational needs.
Apache Ambari with Zeppelin View Integration
Apache Zeppelin Notebook editor from Apache Ambari
If you want to integrate Zeppelin in Ambari with Apache Spark as well, just easily follow the steps on this link
Hortonworks Gallery for Apache Zeppelin
Recently we have published a Gallery where anyone can contribute and add their notebooks publicly in order to share their notebooks, all what you need to do is to grab the notebook folder and upload check it out here
If you are not sure how to start, a great way is to take a look at Hortonworks Gallery for Apache Zeppelin, you will be able to have a 360 view on different ways of creating different notebooks
Helium Application would consists of an View, Algortihm and an Access to the resource, you can get more information of Helium here