Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera and Notebooks (Zeppelin/HUE/Jupyter)

avatar
Explorer

Hi there,

 

working on a CDH 5.11 (ADS/LDAP,Kerberos,Sentry enabled) Cluster.
Now we are evaluation a Notebooksolution.
Workbench (sadly ) does not support the same sql+spark+impala+hive features so we need to take a look beside.

Hue seems to be stop improving the notebook feature so this is out.

Jupyter seems to have everything but using one instance per user is a bit ... i dont like this approavh.

Zeppelin looks nice but im not sure how good it works together with CDH because the support seems to be poor.
I managed to connect zeppelin<->ads but now im facing the Kerberos stuff. 
Is it possible that each zeppelin step uses and manages the kerberos tokens of the user logged in, or do i have to provide one "technical principal" which zeppelin uses.


Besides that question Hortonworks fully supports Zeppelin.
What is Clouderas answer? What notebook should we (blue elefant guys) use?
Workbench is a good one but does not cover all what zeppelin does and seems to point a bit different clients/peaple.

Thanks and BR

4 REPLIES 4

avatar
Master Collaborator

HUE is the supported and recommended tool for SQL (Impala, Hive).

The HUE notebook is not supported.

 

The workbench is the supported and recommended tool for Spark, Python, R, and Scala. Kerberos and security works.

Zeppelin, Jupyter are not supported and it's safe to say there are no plans to do so.

 

What features are you looking for? HUE + workbench should cover everything you mention. I don't know of a difference with Zeppelin in this respect.


What's a blue elephant guy?

avatar
Explorer

thank you for your answer!

I know that i can cover  sql/impala through hue and r+python+spark with workbench, but i dont like the approach.
Most of the time we use sql(impala/hive) for our quick data analysis and then we we go to python/spark to go deeper or dev/test our etl/elt parts.

When everything is fine we put the tested lines our dev ide (intellij+sbt...testing) and and deploy it.

So Zeppelin is a bit more en extention of hour ide dev process in one tool. 

 

I know that the dev approach "should" be different when using the workbench but thats how our process is.

Zeppelin/jupyter solves it all in one for us. 

 

 

("Blue elephant guys" should be funny for "Cloudera Users" compared to the green elephant used by Hortonworks)

 

 

avatar
Master Collaborator

You can execute SQL statements in Pyspark. Same metastore, same data as you are accessing from Hive or Impala.

 

I think one of the premises of the workbench is: edit code, not notebooks. Because that makes it much more realistic to create code that's then used in 'production'. The translation step is an obstacle.

 

I personally think you should use your IDE to do non-interactive software development, and use the workbench for the interactive parts, all within one project. This was my take on it, for Scala: https://github.com/srowen/cdsw-simple-serving

 

I think Jupyter is harder to fit into this vision because it's operating in terms of notebooks, not code at heart. Zeppelin, less so.


So I think we're actually aligned and think the workbench is trying to do what you want. But that's the answer we provide. You can use Zeppelin but you're on your own, and if it's a little tricky, well yeah that's part of the point.

avatar
Explorer
Your point is flawless, I think the issue here (at least at my side) is that the workbench (which I tested in a bootcamp run by Cloudera an year ago) is pretty good, but isn't cheap also.

For labs, developments and all that stuff it is not affordable for a small Company.
In my case, my Company (consultancy) need to be able to develop a new product or service that makes use of ML techniques and would be best developed in a "shared notebook" fashion. The result would be probably sell to the customer together with the workbench, but of course we need to develop it first, with no guarantee of success.
Although we are Cloudera resellers, there's no guarantee the Customer also wants to buy the CDSW license (maybe a "developer license" would cover this gap).

That's why we need to switch to inexpensive software like Zeppelin and Livy to get the job done, at least in alpha stage.

This is my point of view.
Take care,
O.