Support Questions

Find answers, ask questions, and share your expertise

oldWALs folder growing too much....

avatar
Rising Star

Hello,

I have been looking for some information about this, but I did not found anything. Our HDFS file system is growing a lot, and I can see about 60% of space is in "oldWALs" folder. Why is it growing so much? How can I get rid of this?

We deployed our cluster with Ambari with almost no customization. We don't have replication to other clusters or so on...everything quite standard.

Best regards,

Silvio

1 ACCEPTED SOLUTION

avatar
Master Collaborator

There are currently two services which may keep the files in the archive directory. First is a TTL process, which ensures that the WAL files are kept at least for 10 min. This is controlled by hbase.master.logcleaner.ttl configuration property in master.

The other one is replication. If you had replication setup before, the replication processes will hang on to the WAL files until they are replicated. Even if you disabled the replication, the files are still referenced.

You can look at the logs of master from classes (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner) to see whether any exception was thrown.

View solution in original post

11 REPLIES 11

avatar
Super Guru
@Silvio del Valhbase keeps the replication WAL logs until the peer is removed.

remove_peer can fixed the problem.

This can help out - http://stackoverflow.com/questions/28725364/hbase-oldwals-what-it-is-and-how-can-i-clean-it

avatar
Rising Star

From "list_peers" command we get "0 row(s) in 0.1060 seconds", I tested that before asking and I don't know wht's happening. Any other hints?

avatar
Super Guru

@Silvio del Val

Please post the output of "ls /hbase/replication" (and sub-znodes under it) via your "zookeeper-client" shell command. If there are any znodes under there, you will need to clean them up with rm/rmr in the same shell. Once done, try restarting the HMaster and the cleaner should be able to wipe it away.

If you are a hundred percent sure you do not have any form of replication whatsoever in use, nor have any snapshots, you may choose to also delete the oldWAL directory files manually.

avatar
Rising Star

Hi,

This is the output you request:

[zk: localhost:2181(CONNECTED) 4] ls /hbase/replication

[peers, rs]

[zk: localhost:2181(CONNECTED) 5] ls /hbase/replication/peers

[]

No replication to other peers

avatar
Master Collaborator

There are currently two services which may keep the files in the archive directory. First is a TTL process, which ensures that the WAL files are kept at least for 10 min. This is controlled by hbase.master.logcleaner.ttl configuration property in master.

The other one is replication. If you had replication setup before, the replication processes will hang on to the WAL files until they are replicated. Even if you disabled the replication, the files are still referenced.

You can look at the logs of master from classes (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner) to see whether any exception was thrown.

avatar
Rising Star

Well, "archive" folder under /apps/hbase/data" remains "under control" and doesn't grow. My problem is "oldWALs" under same path. I don't have any kind of replication

avatar
Master Collaborator

Disabled replication would still hold on to the WAL files because, it guarantees to not lose data between disable and enable. You can execute remove_peer, which frees up the WAL files eligible for deletion. When you re-add replication peer again, the replication would start from the current status, versus if you re-enable a peer, it will continue from where it left off.

avatar
Rising Star

No "peers" at all here

avatar
Rising Star

Ok, issue is resolved. I had to explicitly add the custom property "hbase.replication=false" to hbase-site.xml (although we have no replication at all and no peers configured) and restart HBase masters. After this, about 50 TB of data in oldWALs folder were deleted automatically in about 10 minutes 🙂

Thank you very much to all of you, you helped me a lot 🙂