Member since
06-20-2017
1
Post
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1684 | 06-20-2017 11:37 AM |
06-20-2017
11:37 AM
Kudu is an MVCC based data store. As a result, when updates or deletes take place a new row is inserted with a tag that indicates that it's currently the valid version. Subsequent queries will return only the current valid version and out of date versions will be ignored. Periodically, Kudu runs a background maitenance process that removes old versions of the row to reclaim space. This process is called "compaction". Currently, Kudu does not provide guarantees at the table level on how old versions of the row will remain on the sytem before compaction. Kudu does allow the user to specify a system-wide "ancient history mark" that defines how old previous row versions need to be around before they're considered eligible for compaction, but for "temporal table support" I think a more granular configuration is required. By default, the ancient history mark is also set to a low value (15 minutes) in order to agressively reclaim space. In the direct Kudu API, you can specify a timestamp to use when doing a get(), and if this is set to a time in the past you will get the row as it existed at the provided timestamp. This functionality is not currently accessible through supported SQL options (Impala, Spark SQL). So, it's possible to do what you're asking with the limitations: 1) You have to use the Kudu API 2) You have to be willing to use the same ancient history mark for the entire system 3) You need to set the ancient history mark to be far enough in the past to be useful for your use case balanced against the extra space requirements of keeping around old row versions It's possible that these caveats could be removed with additional work in Kudu and tools it integrates with, but this is how things work currently.
... View more