Member since
02-15-2019
2
Posts
0
Kudos Received
0
Solutions
06-05-2019
09:22 AM
Facing the same problem. Pepito how did you solve yours?
... View more
04-25-2019
05:37 PM
Hello, I am looking for storage solution for a CDC Data River. The data will be periodically dumped from DB into the storage (eg. HDFS) and passed through the storage from one application into another. There might be multiple applications that will process the data in sequence. The users might want to query the data at each step. They will be mostly interested in the recent state of the data, triggering simple queries, but they might also want to review a state of data as of some date in the past. To reduce the load on applications, I consider processing only the `diff` of data from the previous run (since I have multiple snapshots of a same table, I expect they might not differ too much). I also consider a Near Real Time flow, in which I can get data in a latency of minutes. Looking at solutions available today, I found Apache Hudi and Databricks Delta matching the requirements closely. I would like to know if Hortonworks distribution contains some competitor tool I haven't captured so far. Thank you!
... View more
Labels:
- Labels:
-
Apache Hive