Support Questions
Find answers, ask questions, and share your expertise

Using hadoop/big data to analyze documents(Word/PDF/Text Images)

Using hadoop/big data to analyze documents(Word/PDF/Text Images)

New Contributor

The processing should be able to extract raw text from all documents and make available for real-time search through JAVA API and REST from web applications.The stored documents should also be available for retrieval during real time search from application.

4 REPLIES 4

Re: Using hadoop/big data to analyze documents(Word/PDF/Text Images)

Expert Contributor

@GAURAV ANAND I'm not sure how to help you as you don't ask a question but make a statement. Are you asking for an architecture to achieve this?

Re: Using hadoop/big data to analyze documents(Word/PDF/Text Images)

New Contributor

@Matt Andruf, Yes I am looking for an architecture for the same in which I can ingest the documents from various sources like kafka,rdbms and process them in hadoop and also enable real time search.

Re: Using hadoop/big data to analyze documents(Word/PDF/Text Images)

Expert Contributor

You should check out this tutorial on solr. I think it covers what you are looking for.

Re: Using hadoop/big data to analyze documents(Word/PDF/Text Images)

Super Guru