Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How do I add github dependency to spark?

avatar
Expert Contributor

Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please?

I have done this:

%spark.dep
z.reset() // clean up previously added artifact and repository
// add artifact recursively
z.load("databricks:spark-corenlp:0.2.0-s_2.10")

and this:

import com.databricks.spark.corenlp.functions._
val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")
dfLemmas.show(20, false)

but I get this

<console>:42: error: not found: value lemmas
       val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")

Do I have to download the files and build them or something? If so how do I do that?

Or is there an easier way?

TIA!!!!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Here is how you do it:

  1. Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
  2. Edit the spark2 interpreter and add the name. Save it and allow it to restart.
  3. In Zeppelin:
%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Here is how you do it:

  1. Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
  2. Edit the spark2 interpreter and add the name. Save it and allow it to restart.
  3. In Zeppelin:
%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")