Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to delete data older than x days on hbase tables?

avatar
Explorer

Hi All,
 
Since my hadoop cluster capacity is low and there is no business need to keep old data, I'm trying to find and delete records older than 200 days in hbase tables. I found that there is no tool or ready to use program available to achieve this.
 
Can someone give me the best approach to accomplish this? Should I write a MR Job? If yes, is there any pseudo code or algorithm?
 
Thanks

1 ACCEPTED SOLUTION

avatar
Explorer

Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?

Please guide me

View solution in original post

3 REPLIES 3

avatar
Mentor
You should be able to simply set a TTL on your tables and run a major
compaction to delete older-than-TTL-time data. More on TTL at
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#ttl.

avatar
Explorer

Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?

Please guide me

avatar
Explorer

Hi Harsh,

 

The TTL option works well on most of the tables/cases. But, flume agents loads data to staging tables contineously. In this case, when we run compaction, the regions will go offline and data load fails. So, I had to turnoff the major compaction. Can you help me on how to handle major compaction on these tables to purge old data using TTL?

 

Thanks