Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
How do I add github dependency to spark?
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
Expert Contributor
Created ‎07-04-2017 01:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can someone explain to me what I need to do to get Stanford CoreNLP wrapper for Apache Spark to work in Zeppelin/Spark please?
I have done this:
%spark.dep z.reset() // clean up previously added artifact and repository // add artifact recursively z.load("databricks:spark-corenlp:0.2.0-s_2.10")
and this:
import com.databricks.spark.corenlp.functions._ val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas") dfLemmas.show(20, false)
but I get this
<console>:42: error: not found: value lemmas val dfLemmas= filteredDF.withColumn("lemmas", lemmas('noURL)).select("racist", "filtered","noURL", "lemmas")
Do I have to download the files and build them or something? If so how do I do that?
Or is there an easier way?
TIA!!!!
1 ACCEPTED SOLUTION
Expert Contributor
Created ‎07-05-2017 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is how you do it:
- Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
- Edit the spark2 interpreter and add the name. Save it and allow it to restart.
- In Zeppelin:
%spark.dep z.reset() z.load("databricks:spark-corenlp:0.2.0-s_2.11")
1 REPLY 1
Expert Contributor
Created ‎07-05-2017 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is how you do it:
- Got its 'name' from here . Spark 2.1 needs scala 2.11 version, so name is: databricks:spark-corenlp:0.2.0-s_2.11.
- Edit the spark2 interpreter and add the name. Save it and allow it to restart.
- In Zeppelin:
%spark.dep z.reset() z.load("databricks:spark-corenlp:0.2.0-s_2.11")
