Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pipeline for a real time text analytics

pipeline for a real time text analytics

New Contributor

Hello everyone,

I’m currently working on text analytics; we have a big data of scientific texts. I’m thinking about doing a real-time text analytics, my scenario is : the end user can search in a repository (e.g, Elasticsearch), then the result set will be analyzed by using Hadoop , Spark, or both to extract topics or concepts, or doing some classic clustering such as using K-means. Then the results will go back to the end user.

Here I’m asking about architecture / pipeline to use. Do you have any suggestion?

waiting for your help.

Thank you

Don't have an account?
Coming from Hortonworks? Activate your account here