Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark NLP provision

Spark NLP provision

Explorer
Hello- Does spark provides all NLP features like parts-of-speech tagging, tokenization, entity co-referencing - just like OpenNLP? If so kindly send us link or any workarounds. Thanks
4 REPLIES 4

Re: Spark NLP provision

Master Collaborator
Spark has no particular support for NLP, no. You can use third party
libraries for this.

Re: Spark NLP provision

Explorer

So does this imply "word2vec" of spark MLLIB is not related to NLP? Somewhere mention about stanford NLP with spark?

Re: Spark NLP provision

Master Collaborator
word2vec is just a means of translating bags of items to a vector
space representation. I myself don't call that NLP per se but it is
used to make feature vectors from text. NLP to me is more like
stemming and sentiment analysis. For this you'd be calling to
third-party libraries, like the Stanford NLP library, or building your
own NLP processes on top of generic implementations of, say, LDA in
Spark.

Re: Spark NLP provision

Explorer
Thanks - do we have any java example for "word2vec" - to give text cosinesimilarity.

As per javadoc what does <S> indicate?

unable to use
fit(JavaRDD<S> dataset)

<S extends Iterable<String>>
Word2VecModel fit(JavaRDD<S> dataset)
Computes the vector representation of each word in vocabulary (Java version).