Support Questions

adel_ouzani · ‎06-23-2016

I want to add a library and use it in Zeppelin (ex. Spark-csv). I succeeded in adding it to Spark and using it by putting my Jar in all nodes and adding spark.jars='path-to-jar' in conf/spark-defaults.conf.

However when I call the library from Zeppelin it doesn't work (class not found). From my understanding Zeppelin do a Spark-submit so if the package is already added in Spark it should work. Also, I tried adding using export SPARK_SUBMIT_OPTIONS=”--jars /path/mylib1.jar,/path/mylib2.jar" to zeppelin-env.sh but same problem.

Has anyone suceeded in adding libraries to Zeppelin ? have you seen this problem ?

vshukla · ‎06-24-2016

See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html

Since databricks csv is published to maven, you can just add the following as the first note before any other note.

%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")

View solution in original post

ssanku · ‎06-23-2016

Hi @Adel Quazani,

You can add the libraries in Zepplin with import statements.

For example:

import org.apache.spark.rdd._

import scala.collection.JavaConverters._

import au.com.bytecode.opencsv.CSVReader

Hope that answers your question.

Thanks,

Sujitha Sanku

adel_ouzani · ‎06-24-2016

@sujitha sanku

Thanks

I am talking about libraries that doesn't come with Spark by default like spark-csv. This code works with Spark-shell but not with Zeppelin (same thing if I use Pyspark):

import org.apache.spark.sql.SQLContext
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/sales.csv")
df.printSchema()
val selectedData = df.select("customerId", "itemId")
selectedData.collect()

Should I add import statement ? why this is working in Spark directly

vshukla · ‎06-24-2016

See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html

Since databricks csv is published to maven, you can just add the following as the first note before any other note.

%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")

jhorsch · ‎08-26-2016

Since Zeppelin has evolved so has the answer to this question. In newer versions it is possible to deploy jar files to the local-repo directory if set properly.

zeppelin.interpreter.localRepo	/usr/hdp/current/zeppelin-server/lib/local-repo/2BS9Q4FMD

gdavy · ‎01-06-2017

You can add jar files straight under Interpreter dependencies

Load Dependencies to Interpreter

Click 'Interpreter' menu in navigation bar.
Click 'edit' button of the interpreter which you want to load dependencies to.
Fill artifact and exclude field to your needs. Add the path to the respective jar file.
Press 'Save' to restart the interpreter with loaded libraries.

Regards, George Davy

erkansirin78 · ‎03-14-2018

I use Zeppelin and worked for me thanks @gdavy

ext-roxana_tapi · ‎04-10-2018

I looked in the maven repository, searched for my library XMLHttpRequest in my case and there is the required info (groupId:artifactId:version)... Then I added the artifact to my interpreter as org.webjars.npm:xmlhttprequest:1.8.0 and restarted the interpreter (mongodb) but I don't know how to use the artifact, I tried to import it, but it won't let me, it also doesn't detect the XMLHttpRequest class automatically. How can I use artifacts?

TomMonkeyMan · ‎05-21-2021

Thx, this works for me also.

bsaoula · ‎04-09-2018

Adel,

I have the same issue, I have spark1.6 and I need to use spark-csv, can you tell me what I need to do please.

and for Zeppelin, does it work fot you?

Cloudera Community

Support Questions

Adding libraries to Zeppelin