Created on 11-08-2015 10:56 PM - edited 08-17-2019 01:57 PM
Apache Zeppelin (Incubator at the time of writing this post) is one of my favourite tools that I try to position and present to anyone interested in Analytics, Its 100% open source with an intelligent international team behind it in Korea (NFLABS) (Moving to San Francisco soon), its mainly based on interpreter concept that allows any language/data-processing-backend to be plugged into Apache Zeppelin.
Very similar to IPython/Jupyter except that the UI is probably more appealing and the amount of interpreters supported are richer, at the time of writing this Blog Zeppelin supported:
with this rich set of interpreters provided, it makes on boarding platforms like Apache Hadoop or Data Lake concepts much easier where data is sitting and consolidated somewhere and different organizational units with different skill sets needs to access the data and perform their day to day duties on it as data discovery, queries, data modelling, data streaming and finally Data Science using Apache Spark.
With the notebook style editor and the ability to save notebooks on the fly, you can end up with some really cool notebooks, whether you are a data engineer, data scientist or a BI specialist.
Zeppelin also got a basic clean visualization views integrated with it, it also gives you control over what do you want to include in your graph by dragging and dropping fields in your visualization as below:
Also when you are done with your awesome notebook story, you can easily create a report out of it and either print it or send it out.
If you have never played with Zeppelin before then visit this link for a quick way to start working it out using the latest Hortonworks tutorial we are including Zeppelin as part of HDP as a technical preview, which may supporting it officially may follow, check it out Here try out the different interpreters and how it interacts with Hadoop.
I was recently given access to the beta version of Hub, Hub is supposed to make life in organizations easier when it comes to sharing notebooks between different departments or pepole within the organization.
Lets assume an Organization got Marketing, BI and Data Science practices, the three departments overlaps with each other when it comes to the datasets being used, therfore there is no need anymore for each department to work completely isolated from the others, as they can share their experience together, brag about their notebooks, work together on the same notebook when trying to work on either complicated notebook or different skills are required.
Lets have a deeper look at Hub...
Instance is backed by a Zeppelin installation somewhere (server,laptop,hadoop..etc), every time you create a new Instance a new Token is generated, this token should be added in your local Zeppelin installation under folder /incubator_zeppelin/conf/zeppelin-env.sh e.g.
export ZEPPELINHUB_API_TOKEN="f41d1a2b-98f8-XXXX-2575b9b189"
Once the token is added, you will be able to see the notebooks online whenever you connect to Hub (http://zeppelin.hub.com).
once an instance is added, you will be able to see all the notebook for each instance, and since every space is actually either a dept. or a category of notebooks that needs to be shared across certain people, you can easily drag and drop notebooks into spaces making them shared across this specific space.
Very cool !
Since its beta, there is still much of work to be done like executing notebooks from Hub directly, resizing and formatting and some other minor issues, I am sure the All Stars team @nflabs will make it happen very soon as they always did.
if you are interested in playing with Beta, you may request access on Apache Zeppelin website here
Hortonworks is heavily adopting Apache Zeppelin, that showed in the contribution they have made into the product and into Apache Ambari, @ali one of Rockstars at Hortonworks created an Apache Zeppelin View on Ambari, which gives Zeppelin authentication and allows users to have a single pane of glass when it comes to uploading datasets using HDFS view on Apache Ambari Views and other operational needs.
If you want to integrate Zeppelin in Ambari with Apache Spark as well, just easily follow the steps on this link
Recently we have published a Gallery where anyone can contribute and add their notebooks publicly in order to share their notebooks, all what you need to do is to grab the notebook folder and upload check it out here
If you are not sure how to start, a great way is to take a look at Hortonworks Gallery for Apache Zeppelin, you will be able to have a 360 view on different ways of creating different notebooks
Project Helium is a revolutionary change in Zeppelin, Helium allows you to integrate almost any standard html, css, javascript as a visualization or a view inside Zeppelin.
Helium Application would consists of an View, Algortihm and an Access to the resource, you can get more information of Helium here
Created on 11-09-2015 01:55 PM
@Ned Shawa Nice article! good candidate for official blog
Created on 11-09-2015 02:10 PM
Worth mentioning our Zeppelin Gallery:
Created on 11-11-2015 05:34 PM
@Ned Shawa I got access to the Zeppelin Hub today, can I use the zeppelin part of the HDP 2.3.2 sandbox or do i need to install from apache zeppelin github?
Created on 11-11-2015 10:33 PM
you can, as long as you modify the ZEPPELIN HUB API TOKEN and you have a direct internet connection from the Sandbox
Created on 11-11-2015 10:54 PM
Sweet will try it soon.
Created on 11-12-2015 12:11 AM
Got it syncing to the hub! So if i understand this correct, now if I want to sync these notebooks to another zeppelin, i just put in the same "hub_api_token" in that zeppelin and will it sync to that zeppelin instance? Or is that a feature that's not developed yet?