Support Questions

Find answers, ask questions, and share your expertise

Adding libraries to Zeppelin

avatar
Contributor

I want to add a library and use it in Zeppelin (ex. Spark-csv). I succeeded in adding it to Spark and using it by putting my Jar in all nodes and adding spark.jars='path-to-jar' in conf/spark-defaults.conf.

However when I call the library from Zeppelin it doesn't work (class not found). From my understanding Zeppelin do a Spark-submit so if the package is already added in Spark it should work. Also, I tried adding using export SPARK_SUBMIT_OPTIONS=”--jars /path/mylib1.jar,/path/mylib2.jar" to zeppelin-env.sh but same problem.

Has anyone suceeded in adding libraries to Zeppelin ? have you seen this problem ?

1 ACCEPTED SOLUTION

avatar

See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html

Since databricks csv is published to maven, you can just add the following as the first note before any other note.

%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")

View solution in original post

9 REPLIES 9

avatar
Super Collaborator

Hi @Adel Quazani,

You can add the libraries in Zepplin with import statements.

For example:

import org.apache.spark.rdd._

import scala.collection.JavaConverters._

import au.com.bytecode.opencsv.CSVReader

Hope that answers your question.

Thanks,

Sujitha Sanku

avatar
Contributor

@sujitha sanku

Thanks

I am talking about libraries that doesn't come with Spark by default like spark-csv. This code works with Spark-shell but not with Zeppelin (same thing if I use Pyspark):

import org.apache.spark.sql.SQLContext
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/sales.csv")
df.printSchema()
val selectedData = df.select("customerId", "itemId")
selectedData.collect()

Should I add import statement ? why this is working in Spark directly

avatar

See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html

Since databricks csv is published to maven, you can just add the following as the first note before any other note.

%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")

avatar
Cloudera Employee

Since Zeppelin has evolved so has the answer to this question. In newer versions it is possible to deploy jar files to the local-repo directory if set properly.

zeppelin.interpreter.localRepo	/usr/hdp/current/zeppelin-server/lib/local-repo/2BS9Q4FMD

avatar
New Contributor

You can add jar files straight under Interpreter dependencies

Load Dependencies to Interpreter

  1. Click 'Interpreter' menu in navigation bar.
  2. Click 'edit' button of the interpreter which you want to load dependencies to.
  3. Fill artifact and exclude field to your needs. Add the path to the respective jar file.
  4. Press 'Save' to restart the interpreter with loaded libraries.

Regards, George Davy

avatar
Expert Contributor

I use Zeppelin and worked for me thanks @gdavy

avatar
New Contributor

I looked in the maven repository, searched for my library XMLHttpRequest in my case and there is the required info (groupId:artifactId:version)... Then I added the artifact to my interpreter as org.webjars.npm:xmlhttprequest:1.8.0 and restarted the interpreter (mongodb) but I don't know how to use the artifact, I tried to import it, but it won't let me, it also doesn't detect the XMLHttpRequest class automatically. How can I use artifacts?

avatar
New Contributor

Thx, this works for me also.

avatar
Explorer

Adel,

I have the same issue, I have spark1.6 and I need to use spark-csv, can you tell me what I need to do please.

and for Zeppelin, does it work fot you?