Created on 07-19-2016 08:40 PM - edited 09-16-2022 03:30 AM
Hi,
I am working on a RFP and looking for an answer to:
Ability to recalculate and alert when there are changes to historical data within a time period within your solution:
What I don't understand is we cannot modify the data in HDFS. Its immutable. So the change of historical data, does that applies?
Any help is highly appreciated.
Thanks,
Sujitha
Created 07-20-2016 02:44 AM
@sujitha sanku Here are some thought. Your right data in HDFS is immutable; however, with hive acid and phoenix/hbase you are able to update data. There are internal workings without those products which allow to update data. However at the core data exist in hdfs is not truly updated. It gives the perception. Hence why there is such thing as major/minor compaction. Not going to go into too much detail on that. So if data is updated in hbase, you can use NiFi to detect when a record is changed and based on that create a alert. As for hive/acid I am not aware of similar functionality. However products at attunity have functionality for CDC on hadoop. I would reach out to them. if that is not possible them you can build functionality to do some change tracking. It would be a custom solution. again that is for hive.
Created 07-20-2016 02:44 AM
@sujitha sanku Here are some thought. Your right data in HDFS is immutable; however, with hive acid and phoenix/hbase you are able to update data. There are internal workings without those products which allow to update data. However at the core data exist in hdfs is not truly updated. It gives the perception. Hence why there is such thing as major/minor compaction. Not going to go into too much detail on that. So if data is updated in hbase, you can use NiFi to detect when a record is changed and based on that create a alert. As for hive/acid I am not aware of similar functionality. However products at attunity have functionality for CDC on hadoop. I would reach out to them. if that is not possible them you can build functionality to do some change tracking. It would be a custom solution. again that is for hive.