working on text analytics; we have a big data of scientific texts. I’m thinking about doing a real-time text analytics,
my scenario is : the end user can search
in a repository (e.g, Elasticsearch), then the result set will be analyzed by
using Hadoop , Spark, or both to extract topics or concepts, or doing some
classic clustering such as using K-means. Then the results will go back to the end user.
asking about architecture / pipeline to
use. Do you have any suggestion?