Hello,
We are going to build documents repository (Word, PDF, Excel, pptx, ...).
Is it a good idea to use HDFS + Solr for such repository?
Key requirements are:
1. Store documents with some metadata about documents
2. Full text search of documents
3. Search documents based on metadata about documents
4. Retrive documents from repository
5. In the future we are going to do Natural Language Processing on Word/PDF documents.
Maybe we should better use any other technologies from Hadoop ecosystem like: Ozon or any database like Hbase?
Let's assume that we use CDP Private Cloud.
Best regards
Tomek