Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

3rd party packages in spark and zeppelin

Solved Go to solution
Highlighted

3rd party packages in spark and zeppelin

Is there any differences to load new 3rd party packages using cli or zeppelin if I am using zeppelin as the notebook.

1)cli: spark-shell --packages com.databricks:spark-csv_2.11:1.1.0

or using

2) zeppelin: // add artifact recursively except comma separated GroupID:ArtifactId list

z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: 3rd party packages in spark and zeppelin

I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.

More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading

View solution in original post

4 REPLIES 4
Highlighted

Re: 3rd party packages in spark and zeppelin

I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.

More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading

View solution in original post

Highlighted

Re: 3rd party packages in spark and zeppelin

Guru

The other thing to note is that to use Spark Packages, you also need

z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")

in the dep paragraph. There is currently a bug in the Zeppelin loader which prevents bringing in dependencies here, which we are working on, so for example in spark-csv, you may also have to manually app opencsv dependencies explicitly as well.

Highlighted

Re: 3rd party packages in spark and zeppelin

@Ali Bajwa @Simon Elliston Ball does that mean I have to stop my zeppelin daemon first to pick up the new 3rd party packages. I am getting this error now: Must be used before SparkInterpreter (%spark) initialized

I thought when i created new notebook, I would get new context. But it looks global, am I missing something?

Highlighted

Re: 3rd party packages in spark and zeppelin

First time you ran a cell containing spark, Zeppelin will bring up Spark AM on YARN (not every time you create new notebook). The zeppelin cell which declares the %dep should be run before the Spark AM comes up. If needed you can stop existing Spark app master by restarting the Spark interpreter via Interpreter tab in Zeppelin UI.

Don't have an account?
Coming from Hortonworks? Activate your account here