Support Questions

Find answers, ask questions, and share your expertise

Falcon - hbase

avatar
Super Collaborator

Hi,

Falcon currently doesn't support HBASE as a feed type, however if HBASE is to be used, can someone suggest how to implement a retention policy, fro HDFS we simply delete by the partitioned folder, however in case of HBASE, is there anys uch way.

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Expert Contributor

HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).

Here's an example:

Create a table where the column family has a TTL of 10 seconds:

hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds

Put a record into that table:

hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds

If we scan the table right away, we can see the record:

hbase(main):003:0> scan 'test'
ROW          COLUMN+CELL                                                              
my-row-key   column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds

10 seconds later, the record has disappeared:

hbase(main):004:0> scan 'test'
ROW          COLUMN+CELL
0 row(s) in 0.0130 seconds

So, perhaps you could use TTL to manage your data retention.

View solution in original post

2 REPLIES 2

avatar
Super Guru

I don't believe Falcon presently has integration with HBase.

avatar
Expert Contributor

HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).

Here's an example:

Create a table where the column family has a TTL of 10 seconds:

hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds

Put a record into that table:

hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds

If we scan the table right away, we can see the record:

hbase(main):003:0> scan 'test'
ROW          COLUMN+CELL                                                              
my-row-key   column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds

10 seconds later, the record has disappeared:

hbase(main):004:0> scan 'test'
ROW          COLUMN+CELL
0 row(s) in 0.0130 seconds

So, perhaps you could use TTL to manage your data retention.