Support Questions

YBSNR · ‎03-17-2015

Hi All,

Since my hadoop cluster capacity is low and there is no business need to keep old data, I'm trying to find and delete records older than 200 days in hbase tables. I found that there is no tool or ready to use program available to achieve this.

Can someone give me the best approach to accomplish this? Should I write a MR Job? If yes, is there any pseudo code or algorithm?

Thanks

YBSNR · ‎03-17-2015

Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?

Please guide me

View solution in original post

Harsh J · ‎03-17-2015

You should be able to simply set a TTL on your tables and run a major
compaction to delete older-than-TTL-time data. More on TTL at
http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#ttl.

YBSNR · ‎03-17-2015

Thank you.. Looks like TTL is a good option. But I remember, Major compaction was running for days. When we keep the frequent/ periodic compaction enabled, regions were going offline. how to optimize and control the compactions? To enable TTL, should we compromize on the availability of region?

Please guide me

YBSNR · ‎03-22-2015

Hi Harsh,

The TTL option works well on most of the tables/cases. But, flume agents loads data to staging tables contineously. In this case, when we run compaction, the regions will go offline and data load fails. So, I had to turnoff the major compaction. Can you help me on how to handle major compaction on these tables to purge old data using TTL?

Thanks

Cloudera Community

Support Questions

How to delete data older than x days on hbase tables?

How to delete a Phoenix Table created on Top of Ex...

All Hdfs file names older than N days

Nifi processor that deletes the older day files in...

HDFS dir cleanup which older than 7 days in python...

Iceberg Table Tagging and Branching in Cloudera Da...

How can we delete classifications from deleted ent...

Streamlining Data Processing with Spark HBase Inte...

HBase stores base64 data when data is inserted fro...

Creating HBase HFiles From a Hive Table

Importing Tables from relational database to HBase...