Created 06-20-2016 12:01 PM
Hello,
I have been looking for some information about this, but I did not found anything. Our HDFS file system is growing a lot, and I can see about 60% of space is in "oldWALs" folder. Why is it growing so much? How can I get rid of this?
We deployed our cluster with Ambari with almost no customization. We don't have replication to other clusters or so on...everything quite standard.
Best regards,
Silvio
Created 06-20-2016 01:16 PM
There are currently two services which may keep the files in the archive directory. First is a TTL process, which ensures that the WAL files are kept at least for 10 min. This is controlled by hbase.master.logcleaner.ttl configuration property in master.
The other one is replication. If you had replication setup before, the replication processes will hang on to the WAL files until they are replicated. Even if you disabled the replication, the files are still referenced.
You can look at the logs of master from classes (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner) to see whether any exception was thrown.
Created 06-20-2016 12:05 PM
remove_peer can fixed the problem.
This can help out - http://stackoverflow.com/questions/28725364/hbase-oldwals-what-it-is-and-how-can-i-clean-it
Created 06-20-2016 12:51 PM
From "list_peers" command we get "0 row(s) in 0.1060 seconds", I tested that before asking and I don't know wht's happening. Any other hints?
Created 06-20-2016 01:14 PM
Please post the output of "ls /hbase/replication" (and sub-znodes under it) via your "zookeeper-client" shell command. If there are any znodes under there, you will need to clean them up with rm/rmr in the same shell. Once done, try restarting the HMaster and the cleaner should be able to wipe it away.
If you are a hundred percent sure you do not have any form of replication whatsoever in use, nor have any snapshots, you may choose to also delete the oldWAL directory files manually.
Created 06-20-2016 01:56 PM
Hi,
This is the output you request:
[zk: localhost:2181(CONNECTED) 4] ls /hbase/replication
[peers, rs]
[zk: localhost:2181(CONNECTED) 5] ls /hbase/replication/peers
[]
No replication to other peers
Created 06-20-2016 01:16 PM
There are currently two services which may keep the files in the archive directory. First is a TTL process, which ensures that the WAL files are kept at least for 10 min. This is controlled by hbase.master.logcleaner.ttl configuration property in master.
The other one is replication. If you had replication setup before, the replication processes will hang on to the WAL files until they are replicated. Even if you disabled the replication, the files are still referenced.
You can look at the logs of master from classes (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner) to see whether any exception was thrown.
Created 06-20-2016 02:16 PM
Well, "archive" folder under /apps/hbase/data" remains "under control" and doesn't grow. My problem is "oldWALs" under same path. I don't have any kind of replication
Created 06-20-2016 01:21 PM
Disabled replication would still hold on to the WAL files because, it guarantees to not lose data between disable and enable. You can execute remove_peer, which frees up the WAL files eligible for deletion. When you re-add replication peer again, the replication would start from the current status, versus if you re-enable a peer, it will continue from where it left off.
Created 06-20-2016 02:18 PM
No "peers" at all here
Created 06-21-2016 07:51 AM
Ok, issue is resolved. I had to explicitly add the custom property "hbase.replication=false" to hbase-site.xml (although we have no replication at all and no peers configured) and restart HBase masters. After this, about 50 TB of data in oldWALs folder were deleted automatically in about 10 minutes 🙂
Thank you very much to all of you, you helped me a lot 🙂