Support Questions

Find answers, ask questions, and share your expertise

How do I add github dependency to spark?

avatar
Expert Contributor

Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please?

I have done this:

%spark.dep
z.reset() // clean up previously added artifact and repository
// add artifact recursively
z.load("databricks:spark-corenlp:0.2.0-s_2.10")

and this:

import com.databricks.spark.corenlp.functions._
val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")
dfLemmas.show(20, false)

but I get this

<console>:42: error: not found: value lemmas
       val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")

Do I have to download the files and build them or something? If so how do I do that?

Or is there an easier way?

TIA!!!!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Here is how you do it:

  1. Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
  2. Edit the spark2 interpreter and add the name. Save it and allow it to restart.
  3. In Zeppelin:
%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Here is how you do it:

  1. Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
  2. Edit the spark2 interpreter and add the name. Save it and allow it to restart.
  3. In Zeppelin:
%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")