Created 12-12-2016 10:44 AM
Hi,
Falcon currently doesn't support HBASE as a feed type, however if HBASE is to be used, can someone suggest how to implement a retention policy, fro HDFS we simply delete by the partitioned folder, however in case of HBASE, is there anys uch way.
Thanks,
Avijeet
Created 12-13-2016 05:43 PM
HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).
Here's an example:
Create a table where the column family has a TTL of 10 seconds:
hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10} 0 row(s) in 2.5940 seconds
Put a record into that table:
hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value' 0 row(s) in 0.1420 seconds
If we scan the table right away, we can see the record:
hbase(main):003:0> scan 'test' ROW COLUMN+CELL my-row-key column=cf1:my-col, timestamp=1481650256841, value=my-value 1 row(s) in 0.0260 seconds
10 seconds later, the record has disappeared:
hbase(main):004:0> scan 'test' ROW COLUMN+CELL 0 row(s) in 0.0130 seconds
So, perhaps you could use TTL to manage your data retention.
Created 12-12-2016 05:54 PM
I don't believe Falcon presently has integration with HBase.
Created 12-13-2016 05:43 PM
HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).
Here's an example:
Create a table where the column family has a TTL of 10 seconds:
hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10} 0 row(s) in 2.5940 seconds
Put a record into that table:
hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value' 0 row(s) in 0.1420 seconds
If we scan the table right away, we can see the record:
hbase(main):003:0> scan 'test' ROW COLUMN+CELL my-row-key column=cf1:my-col, timestamp=1481650256841, value=my-value 1 row(s) in 0.0260 seconds
10 seconds later, the record has disappeared:
hbase(main):004:0> scan 'test' ROW COLUMN+CELL 0 row(s) in 0.0130 seconds
So, perhaps you could use TTL to manage your data retention.