Created 03-07-2017 07:00 AM
Hi,
Im using NiFi here and the data flow is running every minute. Im fetching csvs from a web service and put it in hdfs.
Now every file inserted in HDFS, i need to save it to HBASE. But I need to delete rows that is over 2 hours ago. For example the time now is 11:03AM and i have a records/rows that is inserted at 9:03 AM, when the time became 11:04AM i need to delete the records that was insert at 9:03AM. This process of deleting records should also run every minute. And this deleted records also need to be deleted in SOLR+Banana UI.
Is this possible?
Thanks.
Created 03-07-2017 12:29 PM
Deleting rows in HBase is a heavy operation, instead of managing deletions yourself, let HBase handle it via TTL. Basically you can set expiration on a row or alternatively cell and it will be marked as deleted once time to live expires, time is in UTC. https://hbase.apache.org/book.html#ttl
Once row has a delete market it will be cleaned up by a standard compaction mechanism.
Created 03-07-2017 12:29 PM
Deleting rows in HBase is a heavy operation, instead of managing deletions yourself, let HBase handle it via TTL. Basically you can set expiration on a row or alternatively cell and it will be marked as deleted once time to live expires, time is in UTC. https://hbase.apache.org/book.html#ttl
Once row has a delete market it will be cleaned up by a standard compaction mechanism.
Created 03-08-2017 03:02 AM
Thanks sir @Artem Ervits i'll try this one.
Created 03-07-2017 06:57 PM
Solr also supports TTL, although I think if the docs are deleted from HBase they should be deleted in Solr automatically.
In case you are interested in the Solr TTL feature, it's done through an UpdateRequestProcessor (URP). It's currently only documented in Solr's Javadocs: http://lucene.apache.org/solr/6_4_0/solr-core/org/apache/solr/update/processor/DocExpirationUpdatePr... (replace the '6_4_0' part of that URL to get to the javadocs for your version; this URP has existed since Solr 4.8.0).