- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
I want to reduce disk usage.
- Labels:
-
Apache HBase
Created on ‎08-27-2018 07:33 PM - edited ‎09-16-2022 06:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hbase version : 0.92.1-cdh4.0.1
I need to delete outdated files due to lack of Hadoop system capacity.
When I checked the usage of HBase, I was using about 50% in the .archive directory.
19,613G /hbase
10,055G /hbase/.archive
What is the .archive directory?
Can I lower the usage of this directory without losing data?
Created ‎08-28-2018 06:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CDH 4.0.x sources: https://github.com/cloudera/hbase/tree/cdh4.0.0-release
I'm therefore unsure who/what created it in your environment.
Created ‎09-06-2018 01:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get a definitive view on what version the service uses, please visit the HBase Master Web UI to see what the version value on its homepage shows. That's the version you can assume the cluster uses.
Here's what I'd do, perhaps go over to see if it'd be safe for you to follow in your environment:
- Check the # of rows on some critical, large table with a RowCounter job. Keep this info for data check after the operation below.
- Login to a host with the most recent HBase version, run 'hbase shell' and then 'list_snapshots'. If any snapshots show up, and you do not need them, delete them away with 'delete_snapshot' commands. Once done, wait a few minutes and see if the used space begins to reduce due to HFiles from snapshots being cleaned away. If it does, no further actions are needed, and the rest of the points no longer apply to you.
- If there are no snapshots, or there's no such command then stop HBase, MOVE (NOT DELETE, not yet) the .archive directory to /tmp.
- Restart HBase, and if it comes up, run a RowCounter again on the same table to check if the counts are still the same/very close to the prior counting done above.
- If HBase comes up and the counts on your critical tables are the same as before, then proceed with deleting the .archive directory you've moved.
- If HBase does not come up, or the counts vary greatly, then place back the .archive directory in its previous path. This directory cannot be deleted as it is in use by HBase if this is the case, and you'll need to think of an alternative strategy of increasing space (deleting rows in HBase, dropping tables, expanding cluster, etc.)
Does this help?
Created ‎08-27-2018 08:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/hbase/.archive directory for active snapshot-referenced store-file data
[1], but this feature did not exist in HBase from CDH 4.0.1. Are you
certain your version is CDH 4.0.1? Or perhaps was there a rollback of
upgrade from a higher CDH4 version down to CDH 4.0.1 in past? Or otherwise,
is there another CDH cluster remotely copying its snapshots into your
cluster via ExportSnapshot, where this other cluster is from a higher CDH4
version?
If you are absolutely sure that nothing in your CDH4 version accesses the
unused /hbase/.archive directory (you can check via NameNode audit logs
over a period of time where HBase is actively in use), and no snapshots
appear to exist ('list_snapshots' command in HBase shell, if it is
available), then you can try removing the /hbase/.archive directory by
first moving the .archive path outside (to /tmp/ maybe) and then deleting
after ensuring HBase is not affected.
Note: HBase will not retain data unnecessarily. The archive directory
retains data still referenced by tables and/or snapshots, and are cleaned
up otherwise automatically. No part of that data is 'unused' so do not
delete it without checking first.
[1] -
https://blog.cloudera.com/blog/2013/03/introduction-to-apache-hbase-snapshots/
Created on ‎08-28-2018 06:32 PM - edited ‎08-28-2018 06:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
version is hbase-0.92.1 cdh4.01.
Does this version contain an archive directory?
A subdirectory in the archive directory.
/hbase/.archive/-ROOT-
/hbase/.archive/.META.
/hbase/.archive/[tablename]
/hbase/.archive/[tablename]
/hbase/.archive/[tablename]
.....
There is no snapshot directory.
The hbase shell does not have a list_snapshots command.
Created ‎08-28-2018 06:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CDH 4.0.x sources: https://github.com/cloudera/hbase/tree/cdh4.0.0-release
I'm therefore unsure who/what created it in your environment.
Created on ‎08-28-2018 06:52 PM - edited ‎08-28-2018 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I perform major compaction twice a week.
Could the archive directory be related to compaction?
I checked the contents of jira below, but I do not know if it affects my hbase version either.
https://issues.apache.org/jira/browse/HBASE-10371
Also, I found the following warning log in the hbase master log:
WARN: org.apache.hadoop.hbase.util.FSTableDescriptors: the following folder is in hbase's root directory and doesn't contain a table descriptor, do consider deleting it : .archive
Created on ‎09-05-2018 11:13 PM - edited ‎09-05-2018 11:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe my CDH version is not 4.0.1.
I found singularities in the cloudera manager.
I checked the information of each host from the following menu of Claude Manager.
[Cloudera manager > Hosts > (Click one host) > Components ]
The version of installed components on each host was:
----------------------------------------------
Clouudera Manager Agent 4.0.4
Cloudera Manager Management Daemons 4.0.4
HBase 0.92.1+67
----------------------------------------------
And one version of HBase's Region Server is "0.94.15 + 114(0.94.15-cdh4.7.0)", which is different from other Region Servers.
I want to free up space by deleting the '.archive' directory.
Can I delete the '.archive' directory myself?
If so, can I proceed without interruption of the HBase service?
Or should I delete the directory and restart Hbase after I have stopped HBase?
Can I disable it after deleting the '.archive' directory?
I need your help.
Created ‎09-06-2018 01:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get a definitive view on what version the service uses, please visit the HBase Master Web UI to see what the version value on its homepage shows. That's the version you can assume the cluster uses.
Here's what I'd do, perhaps go over to see if it'd be safe for you to follow in your environment:
- Check the # of rows on some critical, large table with a RowCounter job. Keep this info for data check after the operation below.
- Login to a host with the most recent HBase version, run 'hbase shell' and then 'list_snapshots'. If any snapshots show up, and you do not need them, delete them away with 'delete_snapshot' commands. Once done, wait a few minutes and see if the used space begins to reduce due to HFiles from snapshots being cleaned away. If it does, no further actions are needed, and the rest of the points no longer apply to you.
- If there are no snapshots, or there's no such command then stop HBase, MOVE (NOT DELETE, not yet) the .archive directory to /tmp.
- Restart HBase, and if it comes up, run a RowCounter again on the same table to check if the counts are still the same/very close to the prior counting done above.
- If HBase comes up and the counts on your critical tables are the same as before, then proceed with deleting the .archive directory you've moved.
- If HBase does not come up, or the counts vary greatly, then place back the .archive directory in its previous path. This directory cannot be deleted as it is in use by HBase if this is the case, and you'll need to think of an alternative strategy of increasing space (deleting rows in HBase, dropping tables, expanding cluster, etc.)
Does this help?
Created on ‎09-06-2018 01:16 AM - edited ‎09-06-2018 01:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In addition, I have seen logs from the hadoop hdfs audit log to access the .archive directory.
... ugi=hbase(auth:SIMPLE) ip=/192.16.1.150 cmd=mkdir src=/hbase/.archive/tableName/99082b8b557...
... ugi=hbase(auth:SIMPLE) ip=/192.16.1.150 cmd=rename src=/hbase/tableName/99082b8b557... dst=/hbase/.archive/tableName/99082b8b557...
The 192.16.1.150 server is one server with a different version(0.94.15+114).
-----------------------------
This is the result of running the list_snapshots command in the hbase shell of 192.16.1.150 as you said.
hbase(main):001:0> list_snapshots
SNAPSHOT TABLE + CREATION TIME
ERROR: java.io.IOException: java.io.IOException: java.lang.NoSuchMethodException: org.apache.hadoop.hbase.ipc.HMasterInterface.listSnapshots()
at java.lang.Class.getMethod(Class.java:1605)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:334)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
Here is some help for this command:
List all snapshots taken (by printing the names and relative information).
Optional regular expression parameter could be used to filter the output
by snapshot name.
Examples:
hbase> list_snapshots
hbase> list_snapshots 'abc.*'
Created ‎09-27-2018 12:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
However, after the deletion, the .archive directory capacity increases continuously.
The 192.16.1.150 server is one server with a different version(0.94.15+114).
I think the version is due to one other region server (0.94.15 + 114).
Is there an option to disable the .archive directory in this version?
Created ‎09-27-2018 12:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
certainly not be running a service with different minor versions on
its hosts.
