Support Questions

Find answers, ask questions, and share your expertise

HBASE "archive". How to clean? My disk space is vanishing....

avatar
Rising Star

hi! So, I'm the sysadmin of a hadoop cluster. I am not a developer, nor do I "use" it. But... I make sure it's running and happy and secure and... so on.

In reviewing HDFS disk use lately, I noticed our numbers are kinda high.

After some digging, it appears all of the space is going into hbase. OK cool, that's what our developers are doing. Stuffing things in hbase.

But I appear to be losing a bunch of disk space to the hbase "archives" folder. Which is something I assume that hbase is putting stuff in when tables are deleted or...?

I checked with one of our developers, he sees that in the archive there's tables he deleted long ago.
So... my simple question is, how do I clean out unneeded things from the hbase "archive"? I assume manually deleting stuff via hdfs is **not** the way to go.

[hdfs dfs -du -s -h /apps/hbase/data/*
338.6 K /apps/hbase/data/.hbase-snapshot
0 /apps/hbase/data/.tmp
20 /apps/hbase/data/MasterProcWALs
830 /apps/hbase/data/WALs
6.6 T /apps/hbase/data/archive <=== THIS.
0 /apps/hbase/data/corrupt
4.1 T /apps/hbase/data/data
42 /apps/hbase/data/hbase.id
7 /apps/hbase/data/hbase.version
30.7 K /apps/hbase/data/oldWALs

ANY and all help for an hbase newbie would be really appreciated

3 ACCEPTED SOLUTIONS

avatar
Super Collaborator

Check whether you have hbase.master.hfilecleaner.ttl configuration property in hbase-site.xml. It defines TTL for archived files.

Archive directory can keep:

1. old WAL files

2. Old region files after compaction

3. files for snapshots.

I believe that you have some old snapshots and that's why you have so big archive directory. Delete snapshots that are not required and those files will be deleted automatically.

View solution in original post

avatar
Super Guru

You're exactly right that you shouldn't delete things by hand 🙂

If you're on >=HDP-2.5.x, make sure to disable the HBase backup feature. This can hold on to archived WALs. You'd want to set hbase.backup.enable=false in hbase-site.xml.

If you have HBase replication set up, that's also another potential candidate for why those files are not being automatically removed. Lots of HBase snapshots are another candidate (like Sergey suggested already) -- drop the old snapshots you don't need anymore).

Turning on DEBUG in the HBase master should give you some insight to the various "Chores" that run inside the Master to automatically remove (or retain) data.

View solution in original post

avatar
Rising Star

I deleted all the snapshots and data after getting a go-ahead from the developers...

View solution in original post

8 REPLIES 8

avatar
Super Collaborator

Check whether you have hbase.master.hfilecleaner.ttl configuration property in hbase-site.xml. It defines TTL for archived files.

Archive directory can keep:

1. old WAL files

2. Old region files after compaction

3. files for snapshots.

I believe that you have some old snapshots and that's why you have so big archive directory. Delete snapshots that are not required and those files will be deleted automatically.

avatar
Rising Star

As far as I can fine, the hbase.master.hfilecleaner.ttl value was not set at all. (does that then mean.. NO cleaning?). I set it to 900000 (15 minutes) and we'll see if anything happens.


avatar
Super Collaborator

Actually that's supposed to be something like 5 minutes by default. So, check whether you have any old snapshots that you don't need anymore.

avatar
Super Guru

You're exactly right that you shouldn't delete things by hand 🙂

If you're on >=HDP-2.5.x, make sure to disable the HBase backup feature. This can hold on to archived WALs. You'd want to set hbase.backup.enable=false in hbase-site.xml.

If you have HBase replication set up, that's also another potential candidate for why those files are not being automatically removed. Lots of HBase snapshots are another candidate (like Sergey suggested already) -- drop the old snapshots you don't need anymore).

Turning on DEBUG in the HBase master should give you some insight to the various "Chores" that run inside the Master to automatically remove (or retain) data.

avatar
Super Guru
@Kent Brodie

I am assuming you run major compactions probably once a week or some regular schedule. So that is not an issue.

Do you have a lot of snapshots? Here is how snapshots work. When you create a snapshot, it only captures metadata at that point in time. So in case you ever have to restore to that point in time, you restore snapshot. Through metadata that was captured, Snapshot knows which data to restore.

Now, as HBase is running, you might be deleting data. Usually when Major compaction runs, your deleted data is gone for good. Disk space is recovered. However, if you have Snapshots created which are pointing to data that is being deleted, HBase will not delete that data because what if you trying to recover to that particular point in time by restoring the snapshot? So, in that case, the data that snapshot is pointing to is moved to archive folder.

The more Snapshots you have, the more archive folder will grow as needed by Snapshots.

I can only guess, but a reasonable guess of what you are seeing is that you have too many snapshots.

avatar
Rising Star

yup yup yup. Found the snapshots.... guessing THAT is the culprit. Time to have a conversation with the developers.... there's.. a lot.

avatar
New Contributor

@Kent Brodie
Did you get a solution? Please share

avatar
Rising Star

I deleted all the snapshots and data after getting a go-ahead from the developers...