Created 11-04-2015 07:12 PM
Is there any differences to load new 3rd party packages using cli or zeppelin if I am using zeppelin as the notebook.
1)cli: spark-shell --packages com.databricks:spark-csv_2.11:1.1.0
or using
2) zeppelin: // add artifact recursively except comma separated GroupID:ArtifactId list
z.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId, ...")
Created 11-04-2015 07:17 PM
I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.
More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading
Created 11-04-2015 07:17 PM
I believe its the same. You just need to make sure that you don't already have Spark App master up when you run the zeppelin cell which declares the %dep (otherwise it will not get loaded). If needed you can stop existing Spark app master by restart the Spark interpreter via Interpreter tab in Zeppelin UI.
More details on Zeppelin dependency loading in docs: https://zeppelin.incubator.apache.org/docs/interpreter/spark.html#dependencyloading
Created 11-04-2015 07:33 PM
The other thing to note is that to use Spark Packages, you also need
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
in the dep paragraph. There is currently a bug in the Zeppelin loader which prevents bringing in dependencies here, which we are working on, so for example in spark-csv, you may also have to manually app opencsv dependencies explicitly as well.
Created 11-04-2015 08:15 PM
@Ali Bajwa @Simon Elliston Ball does that mean I have to stop my zeppelin daemon first to pick up the new 3rd party packages. I am getting this error now: Must be used before SparkInterpreter (%spark) initialized
I thought when i created new notebook, I would get new context. But it looks global, am I missing something?
Created 11-05-2015 03:45 AM
First time you ran a cell containing spark, Zeppelin will bring up Spark AM on YARN (not every time you create new notebook). The zeppelin cell which declares the %dep should be run before the Spark AM comes up. If needed you can stop existing Spark app master by restarting the Spark interpreter via Interpreter tab in Zeppelin UI.