- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HBASE "archive". How to clean? My disk space is vanishing....
- Labels:
-
Apache HBase
Created ‎08-04-2017 06:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi! So, I'm the sysadmin of a hadoop cluster. I am not a developer, nor do I "use" it. But... I make sure it's running and happy and secure and... so on.
In reviewing HDFS disk use lately, I noticed our numbers are kinda high.
After some digging, it appears all of the space is going into hbase. OK cool, that's what our developers are doing. Stuffing things in hbase.
But I appear to be losing a bunch of disk space to the hbase "archives" folder. Which is something I assume that hbase is putting stuff in when tables are deleted or...?
I checked with one of our developers, he sees that in the archive there's tables he deleted long ago.
So... my simple question is, how do I clean out unneeded things from the hbase "archive"? I assume manually deleting stuff via hdfs is **not** the way to go.
[hdfs dfs -du -s -h /apps/hbase/data/*
338.6 K /apps/hbase/data/.hbase-snapshot
0 /apps/hbase/data/.tmp
20 /apps/hbase/data/MasterProcWALs
830 /apps/hbase/data/WALs
6.6 T /apps/hbase/data/archive <=== THIS.
0 /apps/hbase/data/corrupt
4.1 T /apps/hbase/data/data
42 /apps/hbase/data/hbase.id
7 /apps/hbase/data/hbase.version
30.7 K /apps/hbase/data/oldWALs
ANY and all help for an hbase newbie would be really appreciated
Created ‎08-04-2017 06:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check whether you have hbase.master.hfilecleaner.ttl configuration property in hbase-site.xml. It defines TTL for archived files.
Archive directory can keep:
1. old WAL files
2. Old region files after compaction
3. files for snapshots.
I believe that you have some old snapshots and that's why you have so big archive directory. Delete snapshots that are not required and those files will be deleted automatically.
Created ‎08-04-2017 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're exactly right that you shouldn't delete things by hand 🙂
If you're on >=HDP-2.5.x, make sure to disable the HBase backup feature. This can hold on to archived WALs. You'd want to set hbase.backup.enable=false in hbase-site.xml.
If you have HBase replication set up, that's also another potential candidate for why those files are not being automatically removed. Lots of HBase snapshots are another candidate (like Sergey suggested already) -- drop the old snapshots you don't need anymore).
Turning on DEBUG in the HBase master should give you some insight to the various "Chores" that run inside the Master to automatically remove (or retain) data.
Created ‎12-27-2017 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I deleted all the snapshots and data after getting a go-ahead from the developers...
Created ‎08-04-2017 06:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check whether you have hbase.master.hfilecleaner.ttl configuration property in hbase-site.xml. It defines TTL for archived files.
Archive directory can keep:
1. old WAL files
2. Old region files after compaction
3. files for snapshots.
I believe that you have some old snapshots and that's why you have so big archive directory. Delete snapshots that are not required and those files will be deleted automatically.
Created ‎08-04-2017 07:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I can fine, the hbase.master.hfilecleaner.ttl value was not set at all. (does that then mean.. NO cleaning?). I set it to 900000 (15 minutes) and we'll see if anything happens.
Created ‎08-04-2017 07:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually that's supposed to be something like 5 minutes by default. So, check whether you have any old snapshots that you don't need anymore.
Created ‎08-04-2017 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're exactly right that you shouldn't delete things by hand 🙂
If you're on >=HDP-2.5.x, make sure to disable the HBase backup feature. This can hold on to archived WALs. You'd want to set hbase.backup.enable=false in hbase-site.xml.
If you have HBase replication set up, that's also another potential candidate for why those files are not being automatically removed. Lots of HBase snapshots are another candidate (like Sergey suggested already) -- drop the old snapshots you don't need anymore).
Turning on DEBUG in the HBase master should give you some insight to the various "Chores" that run inside the Master to automatically remove (or retain) data.
Created ‎08-04-2017 07:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am assuming you run major compactions probably once a week or some regular schedule. So that is not an issue.
Do you have a lot of snapshots? Here is how snapshots work. When you create a snapshot, it only captures metadata at that point in time. So in case you ever have to restore to that point in time, you restore snapshot. Through metadata that was captured, Snapshot knows which data to restore.
Now, as HBase is running, you might be deleting data. Usually when Major compaction runs, your deleted data is gone for good. Disk space is recovered. However, if you have Snapshots created which are pointing to data that is being deleted, HBase will not delete that data because what if you trying to recover to that particular point in time by restoring the snapshot? So, in that case, the data that snapshot is pointing to is moved to archive folder.
The more Snapshots you have, the more archive folder will grow as needed by Snapshots.
I can only guess, but a reasonable guess of what you are seeing is that you have too many snapshots.
Created ‎08-04-2017 07:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yup yup yup. Found the snapshots.... guessing THAT is the culprit. Time to have a conversation with the developers.... there's.. a lot.
Created ‎12-23-2017 03:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kent Brodie
Did you get a solution? Please share
Created ‎12-27-2017 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I deleted all the snapshots and data after getting a go-ahead from the developers...
