Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase TTL Inter-Column Rules

HBase TTL Inter-Column Rules

Assume HBase table with rowkey, CF_1, CF_2

CF_1 includes a determined number of "key columns"

CF_2 includes a variable number of "assignment columns"

TTL rules are set on all, on-write.

Requirement is to expire CF_1 only when all columns in CF_2 expired. They are set to expire at different times.

Is there an HBase features allows to implement this requirement in real-time like a rule-based trigger? I am not looking for a script to do it.

4 REPLIES 4

Re: HBase TTL Inter-Column Rules

You might be able to implement this with a custom coprocessor, but it would likely be very challenging to get correct.

I'd probably recommend a nightly job to prune the results from your table and add some application logic to ignore such records (until your job runs again).

Re: HBase TTL Inter-Column Rules

@Josh Elser

Assuming we choose the coprocessor path, would you be concerned with performance impact, other than functionality?

Highlighted

Re: HBase TTL Inter-Column Rules

Guru

Most of the HBase features are cell-oriented rather than row-oriented unlike RDBMSs. For example the TTL is decided based on each individual cell, rather than a given row. Compactions (which is how HBase expires data) will also work for column families separately. They will never see the whole data for a given row.

However, you can still implement what you want with some amount of code. As Josh suggests, you can actually implement a Filter that will only return rows that match your TTL criteria. Then you can issue deletes for those rows periodically.

Re: HBase TTL Inter-Column Rules

Thanks, @Enis and @Josh Elser. I think that a combination of what you suggested can work:

1. Implement a filter that will return rows that match TTL criteria

2. Daily job that will set the TTL to match the criteria for the "logical" row to expire at the compaction time which will be forced to happen at off-peak hours.