Support Questions

Find answers, ask questions, and share your expertise

CoreNLP Server on Spark

avatar
New Contributor

I am new to Cloudera & even Spark. I tested my code with the CoreNLP server. Suppose I deploy this code on Spark & expect it to run in parallel on multiple documents, won't the CoreNLP server become a bottleneck & even a single point of failure.

When I searched for "CoreNLP Server on Spark", the results take me to Databricks - https://github.com/databricks/spark-corenlp

How do I make the NLP tasks running on Spark independent of a singleton service? Or am I missing something. 

1 REPLY 1

avatar
Rising Star
Hello,

AFAIK, the Stanford CoreNLP wrapper for Apache Spark should not be a bottleneck in terms of parallel processing. Spark would take care of running it parallelly on multiple documents. Regardless of the number of documents, the number of API requests to the CoreNLP server would remain the same.