Support Questions

Find answers, ask questions, and share your expertise

Best way to achieve custom retention of some rows in a hbase table

avatar
Contributor

We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?

1 ACCEPTED SOLUTION

avatar
Guru

This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?

A couple of ways that you can evaluate:

1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.

2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".

3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.

4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.

View solution in original post

2 REPLIES 2

avatar
Guru

This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?

A couple of ways that you can evaluate:

1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.

2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".

3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.

4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.

avatar
Master Mentor

@S Roy change column family for those types of records or set TTL explicitly per cell Link