Replication is disabled and I confirmed by executing 'list_peer' and it said replication is disabled.
I also checked RegionServer logs and it always has been moving WALs to oldWALs folder. (since the beginning) But it was getting cleared from oldWALs it seems. There is no trace of Cleaner class in any of the Hbase master logs.
Can anyone please help me debug this further? I appreciate the help 🙂
I further enabled replication and I see this on logs:
2017-04-18 12:52:41,908 INFO [hdpm01:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=<zk-address>:2181
2017-04-18 12:52:41,908 INFO [hdpm01:16000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=<zk>:2181 sessionTimeout=1800000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@546df67f
2017-04-18 12:52:41,918 INFO [hdpm01:16000.activeMasterManager-SendThread(hdps03.labs.ops.use1d.i.riva.co:2181)] zookeeper.ClientCnxn: Opening socket connection to server <zk>/10.10.220.138:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-18 12:52:41,920 INFO [hdpm01:16000.activeMasterManager-SendThread(<zk>2181)] zookeeper.ClientCnxn: Socket connection established to <zk>/10.10.220.138:2181, initiating session
2017-04-18 12:52:41,924 INFO [hdpm01:16000.activeMasterManager-SendThread(<zk>:2181)] zookeeper.ClientCnxn: Session establishment complete on server <zk>/10.10.220.138:2181, sessionid = 0x35b808847460065, negotiated timeout = 40000
2017-04-18 12:52:41,955 INFO [hdpm01:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
I was able to narrow it down further by enabling DEBUG logs. It says
2017-04-18 13:22:42,046 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] master.BackupLogCleaner: Didn't find this log in hbase:backup, keeping: hdfs://<master>:8020/apps/hbase/data/oldWALs/<rs-address>%2C16020%2C1492001909933..meta.1492232550969.meta
2017-04-18 13:22:42,166 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] impl.BackupSystemTable: Check if WAL file has been already backed up in hbase:backup hdfs://<master>:8020/apps/hbase/data/oldWALs/<rs-address>%2C16020%2C1492434572100.default.1492501877892