- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to delete data older than x days on hbase tables?
- Labels:
-
Apache Hadoop
-
Apache HBase
Created on ‎03-17-2015 12:17 AM - edited ‎09-16-2022 02:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Since my hadoop cluster capacity is low and there is no business need to keep old data, I'm trying to find and delete records older than 200 days in hbase tables. I found that there is no tool or ready to use program available to achieve this.
Can someone give me the best approach to accomplish this? Should I write a MR Job? If yes, is there any pseudo code or algorithm?
Thanks
Created ‎03-17-2015 04:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?
Please guide me
Created ‎03-17-2015 12:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
compaction to delete older-than-TTL-time data. More on TTL at
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#ttl.
Created ‎03-17-2015 04:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?
Please guide me
Created ‎03-22-2015 11:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harsh,
The TTL option works well on most of the tables/cases. But, flume agents loads data to staging tables contineously. In this case, when we run compaction, the regions will go offline and data load fails. So, I had to turnoff the major compaction. Can you help me on how to handle major compaction on these tables to purge old data using TTL?
Thanks
