Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Falcon - hbase

avatar
Super Collaborator

Hi,

Falcon currently doesn't support HBASE as a feed type, however if HBASE is to be used, can someone suggest how to implement a retention policy, fro HDFS we simply delete by the partitioned folder, however in case of HBASE, is there anys uch way.

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Expert Contributor

HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).

Here's an example:

Create a table where the column family has a TTL of 10 seconds:

hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds

Put a record into that table:

hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds

If we scan the table right away, we can see the record:

hbase(main):003:0> scan 'test'
ROW          COLUMN+CELL                                                              
my-row-key   column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds

10 seconds later, the record has disappeared:

hbase(main):004:0> scan 'test'
ROW          COLUMN+CELL
0 row(s) in 0.0130 seconds

So, perhaps you could use TTL to manage your data retention.

View solution in original post

2 REPLIES 2

avatar
Super Guru

I don't believe Falcon presently has integration with HBase.

avatar
Expert Contributor

HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).

Here's an example:

Create a table where the column family has a TTL of 10 seconds:

hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds

Put a record into that table:

hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds

If we scan the table right away, we can see the record:

hbase(main):003:0> scan 'test'
ROW          COLUMN+CELL                                                              
my-row-key   column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds

10 seconds later, the record has disappeared:

hbase(main):004:0> scan 'test'
ROW          COLUMN+CELL
0 row(s) in 0.0130 seconds

So, perhaps you could use TTL to manage your data retention.