Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

The folder hbase's oldWALs is so large in CDH5.3.2 & CM5.3.2, how to clean?

Re: The folder hbase's oldWALs is so large in CDH5.3.2 & CM5.3.2, how to clean?

Explorer
Hi,
Forgive me so later to continue this thread.
I looked into the logs. There is the key logs of our product env.
Our cluster run from July 2015. The behavior of oldWALs cleaning is normal.
We can see:
**********************
2015-07-24 16:03:36,348 WARN org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner: Found a log (hdfs://mygameNS/hbase/oldWALs/data6.cdh5.mygame.com%2C60020%2C1436948613604.1437717832214) newer than current time (1437725016348 < 1437725032675), probably a clock skew
*********************
From Agust 2015, We shut down a few server to upgrade hardware.
So, We can see:
***********************
2015-08-10 16:55:21,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server data3.cdh5.mygame.com,60020,1437723463046 returned org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: data3.cdh5.mygame.com/10.1.9.150:60020 for p1_m_activity_svr,\x031406822400f0883b00dde842fcc3231ffdbc128033,1437762599367.5061dc8d57e4061fd2ad858ee5b8223e., try=10 of 10
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: data3.cdh5.mygame.com/10.1.9.150:60020
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.closeRegion(AdminProtos.java:20976)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1719)
at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:730)
at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1705)
at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1822)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1453)
at org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-08-10 16:55:21,027 WARN org.apache.hadoop.hbase.master.RegionStates: Failed to open/close 5061dc8d57e4061fd2ad858ee5b8223e on data3.cdh5.mygame.com,60020,1437723463046, set to FAILED_CLOSE
2015-08-10 16:55:21,027 INFO org.apache.hadoop.hbase.master.RegionStates: Transitioned {5061dc8d57e4061fd2ad858ee5b8223e state=PENDING_OPEN, ts=1439196902996, server=data3.cdh5.mygame.com,60020,1437723463046} to {5061dc8d57e4061fd2ad858ee5b8223e state=FAILED_CLOSE, ts=1439196921027, server=data3.cdh5.mygame.com,60020,1437723463046}
************************
So, the issume come on. the logs:
******************************
2015-08-10 16:55:32,942 WARN org.apache.hadoop.hbase.master.cleaner.CleanerChore: A file cleanermaster:master:60000.oldLogCleaner is stopped, won't delete any more files in:hdfs://mygameNS/hbase/oldWALs
******************************
I reviewed the source code, it seems the cleanchains is stoped.
So, I guess maybe this is a hbase's bug. why cleanchains is stoped?
Maybe restart the cluster can resolve the issue. And but how to fix it meet it again. I am afraid.
BTW, I want to rebuild the issue, but the oldWALs was be cleaning normally. even if I enabled the hbase.replication.

Looking forward to your reply.

Paul Yang.

Re: The folder hbase's oldWALs is so large in CDH5.3.2 & CM5.3.2, how to clean?

Explorer
Hi,
We have cleaned the oldWALs finally.
1. I add the hbase.replication = false by CM=>HBase=>Configuration=>Advanced=>HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
2. reboot back-up HMaster
3. reboot HMaster.
in addtion, Maybe the issue related HBASE-3489, but I am not sure.

Thanks
Paul Yang
Highlighted

Re: The folder hbase's oldWALs is so large in CDH5.3.2 & CM5.3.2, how to clean?

Expert Contributor

Can confirm, disabling replication if you do not have a peer solves the issue.

Don't have an account?
Coming from Hortonworks? Activate your account here