<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: oldWALs not getting cleared even with no replication in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203030#M165033</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/16839/sanketplus.html" nodeid="16839"&gt;@sanket patel&lt;/A&gt; intermittent zk issues can lead to cleaner chors failing. &lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HBASE-15234" target="_blank"&gt;https://issues.apache.org/jira/browse/HBASE-15234&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Apr 2017 01:38:25 GMT</pubDate>
    <dc:creator>ssingla</dc:creator>
    <dc:date>2017-04-19T01:38:25Z</dc:date>
    <item>
      <title>oldWALs not getting cleared even with no replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203029#M165032</link>
      <description>&lt;P&gt;Last week I was resizing HDP cluster and for that I decommissioned the datanode. Stopped Datanode and RegionServer. Formatted and resized volumes. Recommissioned and started regionserver.&lt;/P&gt;&lt;P&gt;Everything went well cluster is in good shape. But from that day the /apps/hbase/data/oldWALs folder started filling up and it's not stopping.&lt;/P&gt;&lt;P&gt;This is what I have tried so far in order:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;add hbase.replication=fase =&amp;gt; restart (this worked for most people)&lt;/LI&gt;&lt;LI&gt;add hbase.master.logcleaner.ttl=10min =&amp;gt; restart&lt;/LI&gt;&lt;LI&gt;add hbase.master.logcleaner.plugins=org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner =&amp;gt; restart&lt;/LI&gt;&lt;LI&gt;full cluster restart (hbase,hdfs,zookeeper,ambari mertrics eveything)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I tried to run following but it has no logs for any of the class (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner)&lt;/P&gt;&lt;PRE&gt;cat /var/log/hbase/hbase-&amp;lt;hostname&amp;gt;.log.* | grep LogClean &lt;/PRE&gt;&lt;P&gt;Replication is disabled and I confirmed by executing 'list_peer' and it said replication is disabled. &lt;/P&gt;&lt;P&gt;I also checked RegionServer logs and it always has been moving WALs to oldWALs folder. (since the beginning) But it was getting cleared from oldWALs it seems. There is no trace of Cleaner class in any of the Hbase master logs. &lt;/P&gt;&lt;P&gt;Can anyone please help me debug this further? I appreciate the help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;EDIT:&lt;/P&gt;&lt;P&gt;I further enabled replication and I see this on logs:&lt;/P&gt;&lt;PRE&gt;2017-04-18 12:52:41,908 INFO  [hdpm01:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=&amp;lt;zk-address&amp;gt;:2181
2017-04-18 12:52:41,908 INFO  [hdpm01:16000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=&amp;lt;zk&amp;gt;:2181 sessionTimeout=1800000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@546df67f
2017-04-18 12:52:41,918 INFO  [hdpm01:16000.activeMasterManager-SendThread(hdps03.labs.ops.use1d.i.riva.co:2181)] zookeeper.ClientCnxn: Opening socket connection to server &amp;lt;zk&amp;gt;/10.10.220.138:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-18 12:52:41,920 INFO  [hdpm01:16000.activeMasterManager-SendThread(&amp;lt;zk&amp;gt;2181)] zookeeper.ClientCnxn: Socket connection established to &amp;lt;zk&amp;gt;/10.10.220.138:2181, initiating session
2017-04-18 12:52:41,924 INFO  [hdpm01:16000.activeMasterManager-SendThread(&amp;lt;zk&amp;gt;:2181)] zookeeper.ClientCnxn: Session establishment complete on server &amp;lt;zk&amp;gt;/10.10.220.138:2181, sessionid = 0x35b808847460065, negotiated timeout = 40000
2017-04-18 12:52:41,955 INFO  [hdpm01:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
&lt;/PRE&gt;&lt;P&gt;I was able to narrow it down further by enabling DEBUG logs. It says&lt;/P&gt;&lt;PRE&gt;2017-04-18 13:22:42,046 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] master.BackupLogCleaner: Didn't find this log in hbase:backup, keeping: hdfs://&amp;lt;master&amp;gt;:8020/apps/hbase/data/oldWALs/&amp;lt;rs-address&amp;gt;%2C16020%2C1492001909933..meta.1492232550969.meta
...

2017-04-18 13:22:42,166 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] impl.BackupSystemTable: Check if WAL file has been already backed up in hbase:backup hdfs://&amp;lt;master&amp;gt;:8020/apps/hbase/data/oldWALs/&amp;lt;rs-address&amp;gt;%2C16020%2C1492434572100.default.1492501877892&lt;/PRE&gt;</description>
      <pubDate>Wed, 19 Apr 2017 00:56:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203029#M165032</guid>
      <dc:creator>sanketplus</dc:creator>
      <dc:date>2017-04-19T00:56:31Z</dc:date>
    </item>
    <item>
      <title>Re: oldWALs not getting cleared even with no replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203030#M165033</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/16839/sanketplus.html" nodeid="16839"&gt;@sanket patel&lt;/A&gt; intermittent zk issues can lead to cleaner chors failing. &lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HBASE-15234" target="_blank"&gt;https://issues.apache.org/jira/browse/HBASE-15234&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 01:38:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203030#M165033</guid>
      <dc:creator>ssingla</dc:creator>
      <dc:date>2017-04-19T01:38:25Z</dc:date>
    </item>
    <item>
      <title>Re: oldWALs not getting cleared even with no replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203031#M165034</link>
      <description>&lt;P&gt;The last debug lines helped me ID the cause. It was the hbase backup utility that was causing the failure to remove oldWALs.&lt;/P&gt;&lt;P&gt;The command below failed:&lt;/P&gt;&lt;PRE&gt;hbase backup full &amp;lt;s3-url&amp;gt; -t &amp;lt;table&amp;gt;&lt;/PRE&gt;&lt;P&gt;and that was verified using &lt;/P&gt;&lt;PRE&gt;hbase backup history&lt;/PRE&gt;&lt;P&gt;So to remove the failed backups&lt;/P&gt;&lt;PRE&gt;hbase backup delete &amp;lt;backup-id&amp;gt;&lt;/PRE&gt;&lt;P&gt;and the next moment, it all cleared &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; &lt;/P&gt;&lt;P&gt;this was pretty edge case and it was mentioned nowhere on internet. Hope this helps someone.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 13:32:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203031#M165034</guid>
      <dc:creator>sanketplus</dc:creator>
      <dc:date>2017-04-19T13:32:23Z</dc:date>
    </item>
    <item>
      <title>Re: oldWALs not getting cleared even with no replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203032#M165035</link>
      <description>&lt;P&gt;thanks &lt;A rel="user" href="https://community.cloudera.com/users/309/ssingla.html" nodeid="309"&gt;@ssingla&lt;/A&gt; , I found the issue. And thanks for pointing out something related, might help in future. &lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 13:43:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/oldWALs-not-getting-cleared-even-with-no-replication/m-p/203032#M165035</guid>
      <dc:creator>sanketplus</dc:creator>
      <dc:date>2017-04-19T13:43:08Z</dc:date>
    </item>
  </channel>
</rss>

