Created on 03-17-2015 12:17 AM - edited 09-16-2022 02:24 AM
Hi All,
Since my hadoop cluster capacity is low and there is no business need to keep old data, I'm trying to find and delete records older than 200 days in hbase tables. I found that there is no tool or ready to use program available to achieve this.
Can someone give me the best approach to accomplish this? Should I write a MR Job? If yes, is there any pseudo code or algorithm?
Thanks
Created 03-17-2015 04:26 AM
Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?
Please guide me
Created 03-17-2015 12:20 AM
Created 03-17-2015 04:26 AM
Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?
Please guide me
Created 03-22-2015 11:43 PM
Hi Harsh,
The TTL option works well on most of the tables/cases. But, flume agents loads data to staging tables contineously. In this case, when we run compaction, the regions will go offline and data load fails. So, I had to turnoff the major compaction. Can you help me on how to handle major compaction on these tables to purge old data using TTL?
Thanks