Support Questions

ed_day · ‎07-04-2017

Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please?

I have done this:

%spark.dep
z.reset() // clean up previously added artifact and repository
// add artifact recursively
z.load("databricks:spark-corenlp:0.2.0-s_2.10")

and this:

import com.databricks.spark.corenlp.functions._
val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")
dfLemmas.show(20, false)

but I get this

<console>:42: error: not found: value lemmas
       val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")

Do I have to download the files and build them or something? If so how do I do that?

Or is there an easier way?

TIA!!!!

ed_day · ‎07-05-2017

Here is how you do it:

Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
Edit the spark2 interpreter and add the name. Save it and allow it to restart.
In Zeppelin:

%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

View solution in original post

ed_day · ‎07-05-2017

Here is how you do it:

Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
Edit the spark2 interpreter and add the name. Save it and allow it to restart.
In Zeppelin:

%spark.dep
z.reset() 
z.load("databricks:spark-corenlp:0.2.0-s_2.11")

Cloudera Community

Support Questions

How do I add github dependency to spark?

Spark 3 legacy configurations list ( Spark 2 behav...

How To: Store Zeppelin Notes in GitHub repo

How to work with Github repositories in CML/CDSW

Spark Scala Version Compatibility Matrix

spark and s3 dependencies

CDSW - CML || How to import a private repository f...

Dynamic Allocation in Apache Spark

Spark Memory Management

Zeppelin Hive Interpreter - add dependencies probl...

What dependencies to submit Spark jobs programmati...