Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best way to achieve custom retention of some rows in a hbase table

Solved Go to solution
Highlighted

Best way to achieve custom retention of some rows in a hbase table

Explorer

We have a hbase table which has only one column family. We have a TTL for 15 days. Currently some of the rows need to be retained longer than 15 days. What would be the best way to achieve the same?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Best way to achieve custom retention of some rows in a hbase table

Guru

This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?

A couple of ways that you can evaluate:

1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.

2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".

3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.

4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.

View solution in original post

2 REPLIES 2
Highlighted

Re: Best way to achieve custom retention of some rows in a hbase table

Guru

This is not usually a standard way of using TTLs. That is why it may need some custom solution. How would you decide what rows to keep and how to mark things to keep?

A couple of ways that you can evaluate:

1. TTL with MIN_VERSIONS=1. If you set MIN_VERSIONS=1 and set TTL, affectively, HBase will make sure that there is at least 1 version of the cells kept around even after TTL expired. However, there won't be a way to mark something for deletion or not.

2. Don't use HBase TTLs, do client side deletion and expiry. In this case, running a custom scanner periodically (every day or so) and deleting data old data, but making sure that you do not delete "rows that have to be kept around".

3. [EXPERIMENTAL] Use per-cell TTLs. Per-cell TTLs are a relatively new feature that you can evaluate whether it is useful in your use case. The TTL for table should be larger than the TTL you set per-cell. Cells that have expired TTLs will be automatically deleted by the compaction. However, you should make sure to set the correct TTL at the time of the writes. This feature is also "experimental" and may not be fully supported in HDP.

4. [ADVANCED] Write your own compaction policy. This way, you can implement exactly what you want with high flexibility.

View solution in original post

Re: Best way to achieve custom retention of some rows in a hbase table

Mentor

@S Roy change column family for those types of records or set TTL explicitly per cell Link

Don't have an account?
Coming from Hortonworks? Activate your account here