Member since
02-04-2023
3
Posts
0
Kudos Received
0
Solutions
05-26-2023
02:47 AM
Hi, We have huge and complex XML files. For example: 15-20 levels in XML tree structure, approximately 180 basic types and 200 complex types, 1 to many relations between nodes in XML tree structure. As the output we want to have tables in Hive or Impala and to use SQL to query this tables. Could you please advise how to that in the most effective way? Effective - that is reducing manual coding works. Best regards
... View more
04-26-2023
12:45 PM
Hello, I would like to refresh this topic. Do you have if is it possible to build efficient documents repository on HDFS? I am concerned if many small files stored end retrived from HDFS will be effective solution? Best regards Tomek
... View more
02-04-2023
02:39 AM
Hello, We are going to build documents repository (Word, PDF, Excel, pptx, ...). Is it a good idea to use HDFS + Solr for such repository? Key requirements are: 1. Store documents with some metadata about documents 2. Full text search of documents 3. Search documents based on metadata about documents 4. Retrive documents from repository 5. In the future we are going to do Natural Language Processing on Word/PDF documents. Maybe we should better use any other technologies from Hadoop ecosystem like: Ozon or any database like Hbase? Let's assume that we use CDP Private Cloud. Best regards Tomek
... View more
Labels:
- Labels:
-
Apache HBase
-
HDFS