- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Central Document Repository on HDFS
- Labels:
-
Apache HBase
-
HDFS
Created 02-04-2023 02:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We are going to build documents repository (Word, PDF, Excel, pptx, ...).
Is it a good idea to use HDFS + Solr for such repository?
Key requirements are:
1. Store documents with some metadata about documents
2. Full text search of documents
3. Search documents based on metadata about documents
4. Retrive documents from repository
5. In the future we are going to do Natural Language Processing on Word/PDF documents.
Maybe we should better use any other technologies from Hadoop ecosystem like: Ozon or any database like Hbase?
Let's assume that we use CDP Private Cloud.
Best regards
Tomek
Created 04-26-2023 12:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to refresh this topic.
Do you have if is it possible to build efficient documents repository on HDFS?
I am concerned if many small files stored end retrived from HDFS will be effective solution?
Best regards
Tomek
