Member since
03-23-2017
41
Posts
5
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1213 | 01-19-2018 08:05 AM | |
5963 | 12-01-2017 06:46 PM | |
5626 | 04-19-2017 06:32 AM |
05-29-2017
07:02 AM
Setting timeouts from HBase conf did not work for me. tickTime in ZK was getting picked for session. Here's more info: https://superuser.blog/hbase-dead-regionserver/
... View more
04-19-2017
06:43 AM
thanks @ssingla , I found the issue. And thanks for pointing out something related, might help in future.
... View more
04-19-2017
06:32 AM
1 Kudo
The last debug lines helped me ID the cause. It was the hbase backup utility that was causing the failure to remove oldWALs. The command below failed: hbase backup full <s3-url> -t <table> and that was verified using hbase backup history So to remove the failed backups hbase backup delete <backup-id> and the next moment, it all cleared 😄 this was pretty edge case and it was mentioned nowhere on internet. Hope this helps someone.
... View more
04-18-2017
05:56 PM
Last week I was resizing HDP cluster and for that I decommissioned the datanode. Stopped Datanode and RegionServer. Formatted and resized volumes. Recommissioned and started regionserver. Everything went well cluster is in good shape. But from that day the /apps/hbase/data/oldWALs folder started filling up and it's not stopping. This is what I have tried so far in order: add hbase.replication=fase => restart (this worked for most people) add hbase.master.logcleaner.ttl=10min => restart add hbase.master.logcleaner.plugins=org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner => restart full cluster restart (hbase,hdfs,zookeeper,ambari mertrics eveything) I tried to run following but it has no logs for any of the class (LogCleaner, TimeToLiveLogCleaner, ReplicationLogCleaner) cat /var/log/hbase/hbase-<hostname>.log.* | grep LogClean Replication is disabled and I confirmed by executing 'list_peer' and it said replication is disabled. I also checked RegionServer logs and it always has been moving WALs to oldWALs folder. (since the beginning) But it was getting cleared from oldWALs it seems. There is no trace of Cleaner class in any of the Hbase master logs. Can anyone please help me debug this further? I appreciate the help 🙂 Thanks! EDIT: I further enabled replication and I see this on logs: 2017-04-18 12:52:41,908 INFO [hdpm01:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=<zk-address>:2181
2017-04-18 12:52:41,908 INFO [hdpm01:16000.activeMasterManager] zookeeper.ZooKeeper: Initiating client connection, connectString=<zk>:2181 sessionTimeout=1800000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@546df67f
2017-04-18 12:52:41,918 INFO [hdpm01:16000.activeMasterManager-SendThread(hdps03.labs.ops.use1d.i.riva.co:2181)] zookeeper.ClientCnxn: Opening socket connection to server <zk>/10.10.220.138:2181. Will not attempt to authenticate using SASL (unknown error)
2017-04-18 12:52:41,920 INFO [hdpm01:16000.activeMasterManager-SendThread(<zk>2181)] zookeeper.ClientCnxn: Socket connection established to <zk>/10.10.220.138:2181, initiating session
2017-04-18 12:52:41,924 INFO [hdpm01:16000.activeMasterManager-SendThread(<zk>:2181)] zookeeper.ClientCnxn: Session establishment complete on server <zk>/10.10.220.138:2181, sessionid = 0x35b808847460065, negotiated timeout = 40000
2017-04-18 12:52:41,955 INFO [hdpm01:16000.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
I was able to narrow it down further by enabling DEBUG logs. It says 2017-04-18 13:22:42,046 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] master.BackupLogCleaner: Didn't find this log in hbase:backup, keeping: hdfs://<master>:8020/apps/hbase/data/oldWALs/<rs-address>%2C16020%2C1492001909933..meta.1492232550969.meta
...
2017-04-18 13:22:42,166 DEBUG [hdpm01.labs.ops.use1b.i.riva.co,16000,1492519955260_ChoreService_1] impl.BackupSystemTable: Check if WAL file has been already backed up in hbase:backup hdfs://<master>:8020/apps/hbase/data/oldWALs/<rs-address>%2C16020%2C1492434572100.default.1492501877892
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
04-18-2017
02:27 PM
https://community.hortonworks.com/questions/97184/oldwals-not-getting-cleared-even-with-no-replicati.html
... View more
04-17-2017
10:22 AM
After stopping DataNode, Hbase RegionServer still holds FDs from mounts and it needs to be stopped before you can unmount the volumes.
... View more
04-17-2017
10:15 AM
I am in exact same situation as you were in. Adding replaction=false property did not help and it's still growing. and strange part is it's been growing for last 7 days only. It was normal before that. Any clues what's happening here?
... View more
04-10-2017
08:48 AM
Can you please tell why do we need to stop all services on the host while only DataNode and NodeManager is affected? Why do we also need to stop RegionServer and other services?
... View more
03-23-2017
01:17 PM
HDP follows different folder structure and it also creates various symlinks for jars and folders. If you had done it previously or you have any idea how to go about this, can you please share the details?
... View more
- « Previous
- Next »