Reply
New Contributor
Posts: 1
Registered: ‎11-26-2018

Scala script not running on Spark

Hello,

 

I am new to Cloudera and I was wondering if I can get help on how to run this scala script.

 

Files: https://github.com/ofermend/practical-data-science-with-hadoop-and-spark/tree/master/ch09

 

So I followed the ingest.sh instructions and completed it all without problems

 

# ingest Ohsumed text collection
wget http://disi.unitn.it/moschitti/corpora/ohsumed-all-docs.tar.gz
tar -zxvf ohsumed-all-docs.tar.gz
hadoop fs -rm -r ohsumed
hadoop fs -mkdir ohsumed
hadoop fs -put ohsumed-all/* ohsumed/
rm -rf ohsumed-all
rm ohsumed-all-docs.tar.gz

# copy stop-words to HDFS
hadoop fs -put stop-words.txt .

# get openNLP jar
wget http://apache.mirrors.hoobly.com/opennlp/opennlp-1.6.0/apache-opennlp-1.6.0-bin.zip
unzip apache-opennlp-1.6.0-bin.zip
cp apache-opennlp-1.6.0/lib/opennlp-tools-1.6.0.jar .
rm -rf apache-opennl-1.6.0/
rm apache-opennlp-1.6.0-bin.zip

 

From my knowledge I can run the scala script using this command 

spark-shell -i lda-script.scala

 

 

But I get this error: 

scala> import opennlp.tools.stemmer.PorterStemmer :23: error: not found: value opennlp import opennlp.tools.stemmer.PorterStemmer.

 

I was wondering what I am doing wrong because i have been searching for a while and so far I have not been finding answers. Are there extra steps I am missing?

 

 

 

Thanks,

L

Highlighted
Cloudera Employee
Posts: 76
Registered: ‎03-01-2016

Re: Scala script not running on Spark

The error message says spark can't find the opennlp class:

 

opennlp.tools.stemmer.PorterStemmer

 

You need to check if the opennlp-tools-1.6.0.jar is present. If not, you need to correct the script:

# get openNLP jar
wget http://apache.mirrors.hoobly.com/opennlp/opennlp-1.6.0/apache-opennlp-1.6.0-bin.zip
unzip apache-opennlp-1.6.0-bin.zip
cp apache-opennlp-1.6.0/lib/opennlp-tools-1.6.0.jar .

 

Announcements