Support Questions

Find answers, ask questions, and share your expertise

Hbase master and regionserver goes down citing lease expired

avatar
Rising Star

All service are up after a restart. But this is a recurring issue. So posting the log to check if this is bug or configuration issue.Habse master and regionserver goes down with following log.

-------------------------------

2016-10-07 03:01:02,867 INFO  [imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager__ChoreService_1] master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout Monitor missed its start time
2016-10-07 03:01:02,898 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 65745ms for sessionid 0x1572d7e3442032c, closing socket connection and attempting reconnect
2016-10-07 03:01:02,898 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x3572d7e3c28020c, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:02,902 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x3572d7e3c28020b, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:02,902 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2572d7e364901d1, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,100 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x1572d7e3442032c has expired, closing socket connection
2016-10-07 03:01:03,100 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor]
2016-10-07 03:01:03,101 FATAL [main-EventThread] master.HMaster: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure master:16000-0x1572d7e3442032c received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,102 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure master:16000-0x1572d7e3442032c received expired from ZooKeeper, aborting
2016-10-07 03:01:03,102 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,102 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper: We slept 63817ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-10-07 03:01:03,102 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: Stopping infoServer
2016-10-07 03:01:03,124 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
2016-10-07 03:01:03,174 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,174 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst3.corp.test.com/172.24.125.132:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,175 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst3.corp.test.com/172.24.125.132:2181, initiating session
2016-10-07 03:01:03,176 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x3572d7e3c28020c has expired, closing socket connection
2016-10-07 03:01:03,176 WARN  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,177 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3572d7e3c28020c
2016-10-07 03:01:03,177 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,224 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] procedure2.ProcedureExecutor: Stopping the procedure executor
2016-10-07 03:01:03,225 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store
2016-10-07 03:01:03,272 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: stopping server imp1tvhdpmst1.corp.test.com,16000,1475152605958
2016-10-07 03:01:03,272 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3572d7e3c28020b
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,454 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x2572d7e364901d1 has expired, closing socket connection
2016-10-07 03:01:03,454 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZooKeeper: Session: 0x3572d7e3c28020b closed
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: stopping server imp1tvhdpmst1.corp.test.com,16000,1475152605958; all regions closed.
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,500 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] hbase.ChoreService: Chore service for: imp1tvhdpmst1.corp.test.com,16000,1475152605958 had [[ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-BalancerChore Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: CatalogJanitor-imp1tvhdpmst1:16000 Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-RegionNormalizerChore Period: 1800000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-ClusterStatusChore Period: 60000 Unit: MILLISECONDS]] on shutdown
2016-10-07 03:01:03,501 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:03,635 ERROR [PriorityRpcServer.handler=8,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv4.corp.test.com,16020,1475152621893 reported a fatal error:
ABORTING region server imp1tvhdpslv4.corp.test.com,16020,1475152621893: regionserver:16020-0x1572d7e3442032e, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x1572d7e3442032e received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,635 ERROR [PriorityRpcServer.handler=10,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv1.corp.test.com,16020,1475152617424 reported a fatal error:
ABORTING region server imp1tvhdpslv1.corp.test.com,16020,1475152617424: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,867 INFO  [imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager__ChoreService_1] master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout Monitor was stopped
2016-10-07 03:01:04,501 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:06,502 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:10,509 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:18,032 ERROR [PriorityRpcServer.handler=13,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv2.corp.test.com,16020,1475152618858 reported a fatal error:
ABORTING region server imp1tvhdpslv2.corp.test.com,16020,1475152618858: Get list of registered region servers
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:454)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:482)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.getRegisteredRegionServers(ReplicationTrackerZKImpl.java:245)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.refreshOtherRegionServersList(ReplicationTrackerZKImpl.java:226)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.access$400(ReplicationTrackerZKImpl.java:42)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.refreshListIfRightPath(ReplicationTrackerZKImpl.java:142)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.nodeDeleted(ReplicationTrackerZKImpl.java:117)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:539)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,033 ERROR [PriorityRpcServer.handler=11,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv3.corp.test.com,16020,1475152617489 reported a fatal error:
ABORTING region server imp1tvhdpslv3.corp.test.com,16020,1475152617489: Get list of registered region servers
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:454)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:482)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.getRegisteredRegionServers(ReplicationTrackerZKImpl.java:245)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.refreshOtherRegionServersList(ReplicationTrackerZKImpl.java:226)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.access$400(ReplicationTrackerZKImpl.java:42)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.refreshListIfRightPath(ReplicationTrackerZKImpl.java:142)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.nodeDeleted(ReplicationTrackerZKImpl.java:117)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:539)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,336 ERROR [PriorityRpcServer.handler=7,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv2.corp.test.com,16020,1475152618858 reported a fatal error:
ABORTING region server imp1tvhdpslv2.corp.test.com,16020,1475152618858: regionserver:16020-0x2572d7e364901d4, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x2572d7e364901d4 received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,372 ERROR [PriorityRpcServer.handler=6,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv3.corp.test.com,16020,1475152617489 reported a fatal error:
ABORTING region server imp1tvhdpslv3.corp.test.com,16020,1475152617489: regionserver:16020-0x1572d7e3442032d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x1572d7e3442032d received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,509 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts
2016-10-07 03:01:18,510 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZKUtil: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Unable to get data of znode /hbase-secure/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZooKeeperWatcher: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] master.ActiveMasterManager: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] hbase.ChoreService: Chore service for: imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager_ had [] on shutdown
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] ipc.RpcServer: Stopping server on 16000
2016-10-07 03:01:18,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:19,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:21,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:23,132 WARN  [LeaseRenewer:hbase@imp1tvhdpmst3.corp.test.com:8020] hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-1572577966_107] for 30 seconds.  Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:4534)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewLease(NameNodeRpcServer.java:1089)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewLease(ClientNamenodeProtocolServerSideTranslatorPB.java:660)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
at org.apache.hadoop.ipc.Client.call(Client.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy19.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:592)
at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:892)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:24,135 WARN  [LeaseRenewer:hbase@imp1tvhdpmst3.corp.test.com:8020] hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-1572577966_107] for 31 seconds.  Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:4534)
Regionserver Log:
2016-10-07 03:01:02,894 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 71395ms for sessionid 0x3572d7e3c28020d, closing socket connection and attempting reconnect
2016-10-07 03:01:02,899 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2572d7e364901d2, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:03,603 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,603 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst2.corp.test.com/172.24.125.131:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,604 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst2.corp.test.com/172.24.125.131:2181, initiating session
2016-10-07 03:01:03,608 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x3572d7e3c28020d has expired, closing socket connection
2016-10-07 03:01:03,608 FATAL [main-EventThread] regionserver.HRegionServer: ABORTING region server imp1tvhdpslv1.corp.test.com,16020,1475152617424: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,608 FATAL [main-EventThread] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, org.apache.hadoop.hbase.security.token.TokenProvider, org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor]
2016-10-07 03:01:03,623 INFO  [main-EventThread] regionserver.HRegionServer: Dump of metrics as JSON on abort: {
  "beans" : [ {
    "name" : "java.lang:type=Memory",
    "modelerType" : "sun.management.MemoryImpl",
    "NonHeapMemoryUsage" : {
      "committed" : 142868480,
      "init" : 136773632,
      "max" : 184549376,
      "used" : 63971032
    },
    "ObjectPendingFinalizationCount" : 0,
    "Verbose" : true,
    "HeapMemoryUsage" : {
      "committed" : 8536260608,
      "init" : 8589934592,
      "max" : 8536260608,
      "used" : 795318008
    },
    "ObjectName" : "java.lang:type=Memory"
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
    "modelerType" : "RegionServer,sub=IPC",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "queueSize" : 0,
    "numCallsInGeneralQueue" : 0,
    "numCallsInReplicationQueue" : 0,
    "numCallsInPriorityQueue" : 0,
    "numOpenConnections" : 0,
    "numActiveHandler" : 0,
    "TotalCallTime_num_ops" : 6766,
    "TotalCallTime_min" : 0,
    "TotalCallTime_max" : 4398,
    "TotalCallTime_mean" : 22.693319538870824,
    "TotalCallTime_median" : 0.0,
    "TotalCallTime_75th_percentile" : 1.0,
    "TotalCallTime_95th_percentile" : 85.0,
    "TotalCallTime_99th_percentile" : 251.58999999999992,
    "exceptions.FailedSanityCheckException" : 0,
    "exceptions.RegionMovedException" : 0,
    "QueueCallTime_num_ops" : 6766,
    "QueueCallTime_min" : 0,
    "QueueCallTime_max" : 25,
    "QueueCallTime_mean" : 0.11409991132131245,
    "QueueCallTime_median" : 0.0,
    "QueueCallTime_75th_percentile" : 0.0,
    "QueueCallTime_95th_percentile" : 1.0,
    "QueueCallTime_99th_percentile" : 1.0,
    "authenticationFailures" : 0,
    "authorizationFailures" : 0,
    "exceptions" : 6,
    "authenticationSuccesses" : 74,
    "authorizationSuccesses" : 74,
    "ProcessCallTime_num_ops" : 6766,
    "ProcessCallTime_min" : 0,
    "ProcessCallTime_max" : 4398,
    "ProcessCallTime_mean" : 22.579219627549513,
    "ProcessCallTime_median" : 0.0,
    "ProcessCallTime_75th_percentile" : 1.0,
    "ProcessCallTime_95th_percentile" : 85.0,
    "ProcessCallTime_99th_percentile" : 251.58999999999992,
    "exceptions.NotServingRegionException" : 6,
    "sentBytes" : 82090551,
    "exceptions.RegionTooBusyException" : 0,
    "receivedBytes" : 205125313,
    "exceptions.OutOfOrderScannerNextException" : 0,
    "exceptions.UnknownScannerException" : 0
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
    "modelerType" : "RegionServer,sub=Replication",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "sink.appliedOps" : 0,
    "sink.appliedBatches" : 0,
    "sink.ageOfLastAppliedOp" : 0
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
    "modelerType" : "RegionServer,sub=Server",
    "tag.zookeeperQuorum" : "imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181",
    "tag.serverName" : "imp1tvhdpslv1.corp.test.com,16020,1475152617424",
    "tag.clusterId" : "af4af9f4-724e-42a0-9b08-c3ec0b260656",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "regionCount" : 12,
    "storeCount" : 12,
    "hlogFileCount" : 2,
    "hlogFileSize" : 0,
    "storeFileCount" : 11,
    "memStoreSize" : 5088,
    "storeFileSize" : 22766460359,
    "regionServerStartTime" : 1475152617424,
    "totalRequestCount" : 3028580,
    "readRequestCount" : 6564,
    "writeRequestCount" : 252,
    "checkMutateFailedCount" : 0,
    "checkMutatePassedCount" : 0,
    "storeFileIndexSize" : 17472,
    "staticIndexSize" : 19363382,
    "staticBloomSize" : 271565340,
    "mutationsWithoutWALCount" : 0,
    "mutationsWithoutWALSize" : 0,
    "percentFilesLocal" : 66,
    "percentFilesLocalSecondaryRegions" : 0,
    "splitQueueLength" : 0,
    "compactionQueueLength" : 0,
    "flushQueueLength" : 0,
    "blockCacheFreeSize" : 3410842632,
    "blockCacheCount" : 3,
    "blockCacheSize" : 3661560,
    "blockCacheHitCount" : 3171629,
    "blockCacheHitCountPrimary" : 3171629,
    "blockCacheMissCount" : 51343,
    "blockCacheMissCountPrimary" : 51343,
    "blockCacheEvictionCount" : 4600,
    "blockCacheEvictionCountPrimary" : 4600,
    "blockCacheCountHitPercent" : 98.0,
    "blockCacheExpressHitPercent" : 99,
    "updatesBlockedTime" : 0,
    "flushedCellsCount" : 252,
    "compactedCellsCount" : 0,
    "majorCompactedCellsCount" : 0,
    "flushedCellsSize" : 57768,
    "compactedCellsSize" : 0,
    "majorCompactedCellsSize" : 0,
    "blockedRequestCount" : 0,
    "splitSuccessCount" : 1,
    "splitRequestCount" : 1,
    "Append_num_ops" : 0,
    "Append_min" : 0,
    "Append_max" : 0,
    "Append_mean" : 0.0,
    "Append_median" : 0.0,
    "Append_75th_percentile" : 0.0,
    "Append_95th_percentile" : 0.0,
    "Append_99th_percentile" : 0.0,
    "Delete_num_ops" : 0,
    "Delete_min" : 0,
    "Delete_max" : 0,
    "Delete_mean" : 0.0,
    "Delete_median" : 0.0,
    "Delete_75th_percentile" : 0.0,
    "Delete_95th_percentile" : 0.0,
    "Delete_99th_percentile" : 0.0,
    "Mutate_num_ops" : 583,
    "Mutate_min" : 1,
    "Mutate_max" : 412,
    "Mutate_mean" : 61.06003430531732,
    "Mutate_median" : 79.0,
    "Mutate_75th_percentile" : 84.0,
    "Mutate_95th_percentile" : 110.0,
    "Mutate_99th_percentile" : 126.6,
    "ScanNext_num_ops" : 21,
    "ScanNext_min" : 0,
    "ScanNext_max" : 31287,
    "ScanNext_mean" : 16041.0,
    "ScanNext_median" : 9034.0,
    "ScanNext_75th_percentile" : 12580.0,
    "ScanNext_95th_percentile" : 12680.0,
    "ScanNext_99th_percentile" : 12680.0,
    "slowDeleteCount" : 0,
    "slowIncrementCount" : 0,
    "FlushTime_num_ops" : 3,
    "FlushTime_min" : 1239,
    "FlushTime_max" : 21322,
    "FlushTime_mean" : 13931.333333333334,
    "FlushTime_median" : 19233.0,
    "FlushTime_75th_percentile" : 21322.0,
    "FlushTime_95th_percentile" : 21322.0,
    "FlushTime_99th_percentile" : 21322.0,
    "Get_num_ops" : 5372,
    "Get_min" : 0,
    "Get_max" : 47,
    "Get_mean" : 0.22412509307520476,
    "Get_median" : 0.0,
    "Get_75th_percentile" : 0.0,
    "Get_95th_percentile" : 1.0,
    "Get_99th_percentile" : 1.0,
    "Replay_num_ops" : 0,
    "Replay_min" : 0,
    "Replay_max" : 0,
    "Replay_mean" : 0.0,
    "Replay_median" : 0.0,
    "Replay_75th_percentile" : 0.0,
    "Replay_95th_percentile" : 0.0,
    "Replay_99th_percentile" : 0.0,
    "slowGetCount" : 0,
    "slowAppendCount" : 0,
    "slowPutCount" : 0,
    "SplitTime_num_ops" : 1,
    "SplitTime_min" : 4334,
    "SplitTime_max" : 4334,
    "SplitTime_mean" : 4334.0,
    "SplitTime_median" : 4334.0,
    "SplitTime_75th_percentile" : 4334.0,
    "SplitTime_95th_percentile" : 4334.0,
    "SplitTime_99th_percentile" : 4334.0,
    "Increment_num_ops" : 0,
    "Increment_min" : 0,
    "Increment_max" : 0,
    "Increment_mean" : 0.0,
    "Increment_median" : 0.0,
    "Increment_75th_percentile" : 0.0,
    "Increment_95th_percentile" : 0.0,
    "Increment_99th_percentile" : 0.0
  } ]
}
2016-10-07 03:01:03,638 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
2016-10-07 03:01:03,638 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,638 WARN  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] util.Sleeper: We slept 62101ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-10-07 03:01:03,639 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue0] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch, finalDestination=hbaseRegional.async.summary.multi_dest.batch.hdfs, interval=01:03.866 minutes, events=1, totalEvents=585, totalSuccessCount=250
2016-10-07 03:01:03,639 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch, finalDestination=hbaseRegional.async.summary.multi_dest.batch.solr, interval=01:03.860 minutes, events=1, totalEvents=552, totalSuccessCount=250
2016-10-07 03:01:03,639 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2016-10-07 03:01:03,640 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: Stopping infoServer
2016-10-07 03:01:03,640 INFO  [SplitLogWorker-imp1tvhdpslv1:16020] regionserver.SplitLogWorker: SplitLogWorker interrupted. Exiting. 
2016-10-07 03:01:03,640 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch.solr, interval=08:47:58.911 hours, events=2, succcessCount=2, totalEvents=250, totalSuccessCount=250
2016-10-07 03:01:03,640 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue0] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch.hdfs, interval=08:47:58.893 hours, events=2, succcessCount=2, totalEvents=250, totalSuccessCount=250
2016-10-07 03:01:03,640 INFO  [SplitLogWorker-imp1tvhdpslv1:16020] regionserver.SplitLogWorker: SplitLogWorker imp1tvhdpslv1.corp.test.com,16020,1475152617424 exiting
2016-10-07 03:01:03,642 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16030
2016-10-07 03:01:03,743 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HeapMemoryManager: Stoping HeapMemoryTuner chore.
2016-10-07 03:01:03,744 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager abruptly.
2016-10-07 03:01:03,744 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2016-10-07 03:01:03,744 INFO  [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2016-10-07 03:01:03,744 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] flush.RegionServerFlushTableProcedureManager: Stopping region server flush procedure manager abruptly.
2016-10-07 03:01:03,749 INFO  [StoreCloserThread-ambarismoketest,,1466187475382.7ee7f2de55e70e37f2fb23624f9e593e.-1] regionserver.HStore: Closed family
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: aborting server imp1tvhdpslv1.corp.test.com,16020,1475152617424
2016-10-07 03:01:03,750 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed ambarismoketest,,1466187475382.7ee7f2de55e70e37f2fb23624f9e593e.
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2572d7e364901d2
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,752 INFO  [StoreCloserThread-du_secure_vault_tst_jcb,,1475653102504.d6e8b646cabc9a73fee39da3f71afcc9.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,752 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,contract_main_resource_num_token_99445B2,1473230358405.57a88265f314ba4a9ff713deef241c37.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,753 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,753 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault_tst_jcb,,1475653102504.d6e8b646cabc9a73fee39da3f71afcc9.
2016-10-07 03:01:03,753 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,755 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,contract_main_resource_num_token_99445B2,1473230358405.57a88265f314ba4a9ff713deef241c37.
2016-10-07 03:01:03,755 INFO  [StoreCloserThread-sun_test,,1464509020983.12c9c4b8684478c63917f0253bf930b8.-1] regionserver.HStore: Closed cf
2016-10-07 03:01:03,755 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed sun_test,,1464509020983.12c9c4b8684478c63917f0253bf930b8.
2016-10-07 03:01:03,757 INFO  [StoreCloserThread-du_secure_vault_tst_may,,1475763178137.0d211090290d0cac9aceb17d29c2e05a.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,757 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_tst_may,,1475763178137.0d211090290d0cac9aceb17d29c2e05a.
2016-10-07 03:01:03,758 INFO  [StoreCloserThread-du_hbase_krb_tst,,1465128507241.11f2c36247ad6f3348ca4013eb99daad.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,758 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,msisdn_raw_key_97152680792,1472729475904.00d916acbfa9044e067b9b6b9ec0e601.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,759 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_hbase_krb_tst,,1465128507241.11f2c36247ad6f3348ca4013eb99daad.
2016-10-07 03:01:03,759 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,msisdn_raw_key_97152680792,1472729475904.00d916acbfa9044e067b9b6b9ec0e601.
2016-10-07 03:01:03,762 INFO  [StoreCloserThread-du_secure_vault_onedaydata,callingnormalized_key_97155790835,1470233185954.293f370188569fa959fe704c6eca64e8.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,762 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata,callingnormalized_key_97155790835,1470233185954.293f370188569fa959fe704c6eca64e8.
2016-10-07 03:01:03,763 INFO  [StoreCloserThread-du_secure_vault_onedaydata,contract_storage_medium_num_token_849867732,1470235487700.89859097282ea3fb2868bc90ded9653a.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,764 INFO  [StoreCloserThread-du_secure_vault_onedaydata,imsi_key_42403021011613,1468488066776.0c2171e0aaba28588914a58f5f70152e.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,764 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault_onedaydata,contract_storage_medium_num_token_849867732,1470235487700.89859097282ea3fb2868bc90ded9653a.
2016-10-07 03:01:03,765 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed du_secure_vault_onedaydata,imsi_key_42403021011613,1468488066776.0c2171e0aaba28588914a58f5f70152e.
2016-10-07 03:01:03,767 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,cust_id_token_5568245463,1474068738847.8110480ef3807a8e15f7508a979f6fe8.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,767 INFO  [StoreCloserThread-du_secure_vault,,1465883027720.d506c1c78844ea16909341ca618e18e3.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,767 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,cust_id_token_5568245463,1474068738847.8110480ef3807a8e15f7508a979f6fe8.
2016-10-07 03:01:03,768 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault,,1465883027720.d506c1c78844ea16909341ca618e18e3.
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] zookeeper.ZooKeeper: Session: 0x2572d7e364901d2 closed
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: stopping server imp1tvhdpslv1.corp.test.com,16020,1475152617424; all regions closed.
2016-10-07 03:01:03,895 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.Leases: regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020 closing leases
2016-10-07 03:01:03,895 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.Leases: regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020 closed leases
2016-10-07 03:01:03,896 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] hbase.ChoreService: Chore service for: imp1tvhdpslv1.corp.test.com,16020,1475152617424 had [[ScheduledChore: Name: MovedRegionsCleaner for region imp1tvhdpslv1.corp.test.com,16020,1475152617424 Period: 120000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpslv1.corp.test.com,16020,1475152617424-MemstoreFlusherChore Period: 10000 Unit: MILLISECONDS]] on shutdown
2016-10-07 03:01:08,678 INFO  [RS_OPEN_META-imp1tvhdpslv1:16020-0-MetaLogRoller] regionserver.LogRoller: LogRoller exiting.
2016-10-07 03:01:08,679 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020.logRoller] regionserver.LogRoller: LogRoller exiting.
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] region
1 ACCEPTED SOLUTION

avatar
Guru

The regionserver is failing because of this:

  1. 2016-10-0703:01:03,102 WARN [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper:We slept 63817ms instead of 3000ms,thisis likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

Please check your GC settings and tune the GC.

View solution in original post

1 REPLY 1

avatar
Guru

The regionserver is failing because of this:

  1. 2016-10-0703:01:03,102 WARN [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper:We slept 63817ms instead of 3000ms,thisis likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

Please check your GC settings and tune the GC.