Created 06-23-2016 10:14 PM
I want to add a library and use it in Zeppelin (ex. Spark-csv). I succeeded in adding it to Spark and using it by putting my Jar in all nodes and adding spark.jars='path-to-jar' in conf/spark-defaults.conf.
However when I call the library from Zeppelin it doesn't work (class not found). From my understanding Zeppelin do a Spark-submit so if the package is already added in Spark it should work. Also, I tried adding using export SPARK_SUBMIT_OPTIONS=”--jars /path/mylib1.jar,/path/mylib2.jar" to zeppelin-env.sh but same problem.
Has anyone suceeded in adding libraries to Zeppelin ? have you seen this problem ?
Created 06-24-2016 11:46 PM
See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html
Since databricks csv is published to maven, you can just add the following as the first note before any other note.
%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")
Created 06-23-2016 11:05 PM
Hi @Adel Quazani,
You can add the libraries in Zepplin with import statements.
For example:
import org.apache.spark.rdd._
import scala.collection.JavaConverters._
import au.com.bytecode.opencsv.CSVReader
Hope that answers your question.
Thanks,
Sujitha Sanku
Created 06-24-2016 05:44 AM
I am talking about libraries that doesn't come with Spark by default like spark-csv. This code works with Spark-shell but not with Zeppelin (same thing if I use Pyspark):
import org.apache.spark.sql.SQLContext val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/sales.csv") df.printSchema() val selectedData = df.select("customerId", "itemId") selectedData.collect()
Should I add import statement ? why this is working in Spark directly
Created 06-24-2016 11:46 PM
See import external library section of https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html
Since databricks csv is published to maven, you can just add the following as the first note before any other note.
%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")
Created 08-26-2016 07:31 PM
Since Zeppelin has evolved so has the answer to this question. In newer versions it is possible to deploy jar files to the local-repo directory if set properly.
zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/lib/local-repo/2BS9Q4FMD
Created 01-06-2017 07:55 PM
You can add jar files straight under Interpreter dependencies
Load Dependencies to Interpreter
Regards, George Davy
Created 03-14-2018 02:21 PM
I use Zeppelin and worked for me thanks @gdavy
Created 04-10-2018 09:55 AM
I looked in the maven repository, searched for my library XMLHttpRequest in my case and there is the required info (groupId:artifactId:version)... Then I added the artifact to my interpreter as org.webjars.npm:xmlhttprequest:1.8.0 and restarted the interpreter (mongodb) but I don't know how to use the artifact, I tried to import it, but it won't let me, it also doesn't detect the XMLHttpRequest class automatically. How can I use artifacts?
Created 05-21-2021 01:33 AM
Thx, this works for me also.
Created 04-09-2018 05:52 PM
Adel,
I have the same issue, I have spark1.6 and I need to use spark-csv, can you tell me what I need to do please.
and for Zeppelin, does it work fot you?