- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Best way to achieve custom retention of some rows in a hbase table
- Labels:
-
Apache HBase
Created ‎02-05-2016 11:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?
Created ‎02-05-2016 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?
A couple of ways that you can evaluate:
1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.
2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".
3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.
4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.
Created ‎02-05-2016 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?
A couple of ways that you can evaluate:
1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.
2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".
3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.
4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.
Created ‎02-05-2016 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
