Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

3rd party packages in spark and zeppelin

avatar

Is there any differences to load new 3rd party packages using cli or zeppelin if I am using zeppelin as the notebook.

1)cli: spark-shell --packages com.databricks:spark-csv_2.11:1.1.0

or using

2) zeppelin: // add artifact recursively except comma separated GroupID:ArtifactId list

z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")

1 ACCEPTED SOLUTION

avatar

I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.

More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading

View solution in original post

4 REPLIES 4

avatar

I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.

More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading

avatar
Guru

The other thing to note is that to use Spark Packages, you also need

z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")

in the dep paragraph. There is currently a bug in the Zeppelin loader which prevents bringing in dependencies here, which we are working on, so for example in spark-csv, you may also have to manually app opencsv dependencies explicitly as well.

avatar

@Ali Bajwa @Simon Elliston Ball does that mean I have to stop my zeppelin daemon first to pick up the new 3rd party packages. I am getting this error now: Must be used before SparkInterpreter (%spark) initialized

I thought when i created new notebook, I would get new context. But it looks global, am I missing something?

avatar

First time you ran a cell containing spark, Zeppelin will bring up Spark AM on YARN (not every time you create new notebook). The zeppelin cell which declares the %dep should be run before the Spark AM comes up. If needed you can stop existing Spark app master by restarting the Spark interpreter via Interpreter tab in Zeppelin UI.