Support Questions

NavRa · ‎07-12-2020

I am new to Cloudera & even Spark. I tested my code with the CoreNLP server. Suppose I deploy this code on Spark & expect it to run in parallel on multiple documents, won't the CoreNLP server become a bottleneck & even a single point of failure.

When I searched for "CoreNLP Server on Spark", the results take me to Databricks - https://github.com/databricks/spark-corenlp

How do I make the NLP tasks running on Spark independent of a singleton service? Or am I missing something.

gsthina · ‎07-14-2020

Hello,

AFAIK, the Stanford CoreNLP wrapper for Apache Spark should not be a bottleneck in terms of parallel processing. Spark would take care of running it parallelly on multiple documents. Regardless of the number of documents, the number of API requests to the CoreNLP server would remain the same.

Cloudera Community

Support Questions

CoreNLP Server on Spark

Secure(SSL encryption) Spark Thrift server

Adding Stanford CoreNLP To Big Data Pipelines (Apa...

GC logging for Spark History Server

Installing Spark Thrift Server in a Kerberos secur...

Spark in CML: Recommendations for using Spark in C...

Some Spark Thrift Server errors, with their work a...

Connect to Spark Thrift server (Kerberos enabled) ...

Cloudera Data Engineering Spark Job with Python Wh...

How to set up authentication for spark history ser...

Spark Remote Debugging