Archives of Support Questions (Read Only)

ed_day · ‎07-04-2017

Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please?

I have done this:

%spark.dep
z.reset() // clean up previously added artifact and repository
// add artifact recursively
z.load("databricks:spark-corenlp:0.2.0-s_2.10")

and this:

import com.databricks.spark.corenlp.functions._
val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")
dfLemmas.show(20, false)

but I get this

<console>:42: error: not found: value lemmas
       val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")

Do I have to download the files and build them or something? If so how do I do that?

Or is there an easier way?

TIA!!!!

ed_day · ‎07-05-2017

Here is how you do it:

Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
Edit the spark2 interpreter and add the name. Save it and allow it to restart.
In Zeppelin:

%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

View solution in original post

ed_day · ‎07-05-2017

Here is how you do it:

Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
Edit the spark2 interpreter and add the name. Save it and allow it to restart.
In Zeppelin:

%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

Cloudera Community

Archives of Support Questions (Read Only)

How do I add github dependency to spark?