Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase master and regionserver goes down citing lease expired

avatar
Rising Star

All service are up after a restart. But this is a recurring issue. So posting the log to check if this is bug or configuration issue.Habse master and regionserver goes down with following log.

-------------------------------

2016-10-07 03:01:02,867 INFO  [imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager__ChoreService_1] master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout Monitor missed its start time
2016-10-07 03:01:02,898 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 65745ms for sessionid 0x1572d7e3442032c, closing socket connection and attempting reconnect
2016-10-07 03:01:02,898 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x3572d7e3c28020c, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:02,902 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x3572d7e3c28020b, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:02,902 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2572d7e364901d1, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,096 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,100 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x1572d7e3442032c has expired, closing socket connection
2016-10-07 03:01:03,100 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor]
2016-10-07 03:01:03,101 FATAL [main-EventThread] master.HMaster: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure master:16000-0x1572d7e3442032c received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,102 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure master:16000-0x1572d7e3442032c received expired from ZooKeeper, aborting
2016-10-07 03:01:03,102 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,102 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper: We slept 63817ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-10-07 03:01:03,102 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: Stopping infoServer
2016-10-07 03:01:03,124 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
2016-10-07 03:01:03,174 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,174 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst3.corp.test.com/172.24.125.132:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,175 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst3.corp.test.com/172.24.125.132:2181, initiating session
2016-10-07 03:01:03,176 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x3572d7e3c28020c has expired, closing socket connection
2016-10-07 03:01:03,176 WARN  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,177 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3572d7e3c28020c
2016-10-07 03:01:03,177 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,224 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] procedure2.ProcedureExecutor: Stopping the procedure executor
2016-10-07 03:01:03,225 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store
2016-10-07 03:01:03,272 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: stopping server imp1tvhdpmst1.corp.test.com,16000,1475152605958
2016-10-07 03:01:03,272 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3572d7e3c28020b
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,452 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,454 INFO  [imp1tvhdpmst1:16000.activeMasterManager-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x2572d7e364901d1 has expired, closing socket connection
2016-10-07 03:01:03,454 INFO  [imp1tvhdpmst1:16000.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZooKeeper: Session: 0x3572d7e3c28020b closed
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] regionserver.HRegionServer: stopping server imp1tvhdpmst1.corp.test.com,16000,1475152605958; all regions closed.
2016-10-07 03:01:03,499 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,500 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] hbase.ChoreService: Chore service for: imp1tvhdpmst1.corp.test.com,16000,1475152605958 had [[ScheduledChore: Name: HFileCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-BalancerChore Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: CatalogJanitor-imp1tvhdpmst1:16000 Period: 300000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-RegionNormalizerChore Period: 1800000 Unit: MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpmst1.corp.test.com,16000,1475152605958-ClusterStatusChore Period: 60000 Unit: MILLISECONDS]] on shutdown
2016-10-07 03:01:03,501 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:03,635 ERROR [PriorityRpcServer.handler=8,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv4.corp.test.com,16020,1475152621893 reported a fatal error:
ABORTING region server imp1tvhdpslv4.corp.test.com,16020,1475152621893: regionserver:16020-0x1572d7e3442032e, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x1572d7e3442032e received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,635 ERROR [PriorityRpcServer.handler=10,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv1.corp.test.com,16020,1475152617424 reported a fatal error:
ABORTING region server imp1tvhdpslv1.corp.test.com,16020,1475152617424: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,867 INFO  [imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager__ChoreService_1] master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout Monitor was stopped
2016-10-07 03:01:04,501 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:06,502 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:10,509 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:18,032 ERROR [PriorityRpcServer.handler=13,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv2.corp.test.com,16020,1475152618858 reported a fatal error:
ABORTING region server imp1tvhdpslv2.corp.test.com,16020,1475152618858: Get list of registered region servers
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:454)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:482)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.getRegisteredRegionServers(ReplicationTrackerZKImpl.java:245)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.refreshOtherRegionServersList(ReplicationTrackerZKImpl.java:226)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.access$400(ReplicationTrackerZKImpl.java:42)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.refreshListIfRightPath(ReplicationTrackerZKImpl.java:142)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.nodeDeleted(ReplicationTrackerZKImpl.java:117)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:539)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,033 ERROR [PriorityRpcServer.handler=11,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv3.corp.test.com,16020,1475152617489 reported a fatal error:
ABORTING region server imp1tvhdpslv3.corp.test.com,16020,1475152617489: Get list of registered region servers
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:454)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:482)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.getRegisteredRegionServers(ReplicationTrackerZKImpl.java:245)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.refreshOtherRegionServersList(ReplicationTrackerZKImpl.java:226)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl.access$400(ReplicationTrackerZKImpl.java:42)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.refreshListIfRightPath(ReplicationTrackerZKImpl.java:142)
at org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$OtherRegionServerWatcher.nodeDeleted(ReplicationTrackerZKImpl.java:117)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:539)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,336 ERROR [PriorityRpcServer.handler=7,queue=1,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv2.corp.test.com,16020,1475152618858 reported a fatal error:
ABORTING region server imp1tvhdpslv2.corp.test.com,16020,1475152618858: regionserver:16020-0x2572d7e364901d4, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x2572d7e364901d4 received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,372 ERROR [PriorityRpcServer.handler=6,queue=0,port=16000] master.MasterRpcServices: Region server imp1tvhdpslv3.corp.test.com,16020,1475152617489 reported a fatal error:
ABORTING region server imp1tvhdpslv3.corp.test.com,16020,1475152617489: regionserver:16020-0x1572d7e3442032d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x1572d7e3442032d received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:18,509 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts
2016-10-07 03:01:18,510 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZKUtil: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Unable to get data of znode /hbase-secure/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.ZooKeeperWatcher: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,510 ERROR [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] master.ActiveMasterManager: master:16000-0x1572d7e3442032c, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:621)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1175)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] hbase.ChoreService: Chore service for: imp1tvhdpmst1.corp.test.com,16000,1475152605958_splitLogManager_ had [] on shutdown
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2016-10-07 03:01:18,512 INFO  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] ipc.RpcServer: Stopping server on 16000
2016-10-07 03:01:18,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:19,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:21,512 WARN  [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/tokenauth/keymaster
2016-10-07 03:01:23,132 WARN  [LeaseRenewer:hbase@imp1tvhdpmst3.corp.test.com:8020] hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-1572577966_107] for 30 seconds.  Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:4534)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewLease(NameNodeRpcServer.java:1089)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewLease(ClientNamenodeProtocolServerSideTranslatorPB.java:660)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
at org.apache.hadoop.ipc.Client.call(Client.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy19.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:592)
at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:892)
at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:745)
2016-10-07 03:01:24,135 WARN  [LeaseRenewer:hbase@imp1tvhdpmst3.corp.test.com:8020] hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-1572577966_107] for 31 seconds.  Will retry shortly ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewLease(FSNamesystem.java:4534)
Regionserver Log:
2016-10-07 03:01:02,894 INFO  [main-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 71395ms for sessionid 0x3572d7e3c28020d, closing socket connection and attempting reconnect
2016-10-07 03:01:02,899 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst3.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2572d7e364901d2, likely server has closed socket, closing socket connection and attempting reconnect
2016-10-07 03:01:03,603 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,603 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst2.corp.test.com/172.24.125.131:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,604 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst2.corp.test.com/172.24.125.131:2181, initiating session
2016-10-07 03:01:03,608 INFO  [main-SendThread(imp1tvhdpmst2.corp.test.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x3572d7e3c28020d has expired, closing socket connection
2016-10-07 03:01:03,608 FATAL [main-EventThread] regionserver.HRegionServer: ABORTING region server imp1tvhdpslv1.corp.test.com,16020,1475152617424: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2016-10-07 03:01:03,608 FATAL [main-EventThread] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint, org.apache.hadoop.hbase.security.token.TokenProvider, org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor]
2016-10-07 03:01:03,623 INFO  [main-EventThread] regionserver.HRegionServer: Dump of metrics as JSON on abort: {
  "beans" : [ {
    "name" : "java.lang:type=Memory",
    "modelerType" : "sun.management.MemoryImpl",
    "NonHeapMemoryUsage" : {
      "committed" : 142868480,
      "init" : 136773632,
      "max" : 184549376,
      "used" : 63971032
    },
    "ObjectPendingFinalizationCount" : 0,
    "Verbose" : true,
    "HeapMemoryUsage" : {
      "committed" : 8536260608,
      "init" : 8589934592,
      "max" : 8536260608,
      "used" : 795318008
    },
    "ObjectName" : "java.lang:type=Memory"
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
    "modelerType" : "RegionServer,sub=IPC",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "queueSize" : 0,
    "numCallsInGeneralQueue" : 0,
    "numCallsInReplicationQueue" : 0,
    "numCallsInPriorityQueue" : 0,
    "numOpenConnections" : 0,
    "numActiveHandler" : 0,
    "TotalCallTime_num_ops" : 6766,
    "TotalCallTime_min" : 0,
    "TotalCallTime_max" : 4398,
    "TotalCallTime_mean" : 22.693319538870824,
    "TotalCallTime_median" : 0.0,
    "TotalCallTime_75th_percentile" : 1.0,
    "TotalCallTime_95th_percentile" : 85.0,
    "TotalCallTime_99th_percentile" : 251.58999999999992,
    "exceptions.FailedSanityCheckException" : 0,
    "exceptions.RegionMovedException" : 0,
    "QueueCallTime_num_ops" : 6766,
    "QueueCallTime_min" : 0,
    "QueueCallTime_max" : 25,
    "QueueCallTime_mean" : 0.11409991132131245,
    "QueueCallTime_median" : 0.0,
    "QueueCallTime_75th_percentile" : 0.0,
    "QueueCallTime_95th_percentile" : 1.0,
    "QueueCallTime_99th_percentile" : 1.0,
    "authenticationFailures" : 0,
    "authorizationFailures" : 0,
    "exceptions" : 6,
    "authenticationSuccesses" : 74,
    "authorizationSuccesses" : 74,
    "ProcessCallTime_num_ops" : 6766,
    "ProcessCallTime_min" : 0,
    "ProcessCallTime_max" : 4398,
    "ProcessCallTime_mean" : 22.579219627549513,
    "ProcessCallTime_median" : 0.0,
    "ProcessCallTime_75th_percentile" : 1.0,
    "ProcessCallTime_95th_percentile" : 85.0,
    "ProcessCallTime_99th_percentile" : 251.58999999999992,
    "exceptions.NotServingRegionException" : 6,
    "sentBytes" : 82090551,
    "exceptions.RegionTooBusyException" : 0,
    "receivedBytes" : 205125313,
    "exceptions.OutOfOrderScannerNextException" : 0,
    "exceptions.UnknownScannerException" : 0
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
    "modelerType" : "RegionServer,sub=Replication",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "sink.appliedOps" : 0,
    "sink.appliedBatches" : 0,
    "sink.ageOfLastAppliedOp" : 0
  } ],
  "beans" : [ {
    "name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
    "modelerType" : "RegionServer,sub=Server",
    "tag.zookeeperQuorum" : "imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181",
    "tag.serverName" : "imp1tvhdpslv1.corp.test.com,16020,1475152617424",
    "tag.clusterId" : "af4af9f4-724e-42a0-9b08-c3ec0b260656",
    "tag.Context" : "regionserver",
    "tag.Hostname" : "imp1tvhdpslv1.corp.test.com",
    "regionCount" : 12,
    "storeCount" : 12,
    "hlogFileCount" : 2,
    "hlogFileSize" : 0,
    "storeFileCount" : 11,
    "memStoreSize" : 5088,
    "storeFileSize" : 22766460359,
    "regionServerStartTime" : 1475152617424,
    "totalRequestCount" : 3028580,
    "readRequestCount" : 6564,
    "writeRequestCount" : 252,
    "checkMutateFailedCount" : 0,
    "checkMutatePassedCount" : 0,
    "storeFileIndexSize" : 17472,
    "staticIndexSize" : 19363382,
    "staticBloomSize" : 271565340,
    "mutationsWithoutWALCount" : 0,
    "mutationsWithoutWALSize" : 0,
    "percentFilesLocal" : 66,
    "percentFilesLocalSecondaryRegions" : 0,
    "splitQueueLength" : 0,
    "compactionQueueLength" : 0,
    "flushQueueLength" : 0,
    "blockCacheFreeSize" : 3410842632,
    "blockCacheCount" : 3,
    "blockCacheSize" : 3661560,
    "blockCacheHitCount" : 3171629,
    "blockCacheHitCountPrimary" : 3171629,
    "blockCacheMissCount" : 51343,
    "blockCacheMissCountPrimary" : 51343,
    "blockCacheEvictionCount" : 4600,
    "blockCacheEvictionCountPrimary" : 4600,
    "blockCacheCountHitPercent" : 98.0,
    "blockCacheExpressHitPercent" : 99,
    "updatesBlockedTime" : 0,
    "flushedCellsCount" : 252,
    "compactedCellsCount" : 0,
    "majorCompactedCellsCount" : 0,
    "flushedCellsSize" : 57768,
    "compactedCellsSize" : 0,
    "majorCompactedCellsSize" : 0,
    "blockedRequestCount" : 0,
    "splitSuccessCount" : 1,
    "splitRequestCount" : 1,
    "Append_num_ops" : 0,
    "Append_min" : 0,
    "Append_max" : 0,
    "Append_mean" : 0.0,
    "Append_median" : 0.0,
    "Append_75th_percentile" : 0.0,
    "Append_95th_percentile" : 0.0,
    "Append_99th_percentile" : 0.0,
    "Delete_num_ops" : 0,
    "Delete_min" : 0,
    "Delete_max" : 0,
    "Delete_mean" : 0.0,
    "Delete_median" : 0.0,
    "Delete_75th_percentile" : 0.0,
    "Delete_95th_percentile" : 0.0,
    "Delete_99th_percentile" : 0.0,
    "Mutate_num_ops" : 583,
    "Mutate_min" : 1,
    "Mutate_max" : 412,
    "Mutate_mean" : 61.06003430531732,
    "Mutate_median" : 79.0,
    "Mutate_75th_percentile" : 84.0,
    "Mutate_95th_percentile" : 110.0,
    "Mutate_99th_percentile" : 126.6,
    "ScanNext_num_ops" : 21,
    "ScanNext_min" : 0,
    "ScanNext_max" : 31287,
    "ScanNext_mean" : 16041.0,
    "ScanNext_median" : 9034.0,
    "ScanNext_75th_percentile" : 12580.0,
    "ScanNext_95th_percentile" : 12680.0,
    "ScanNext_99th_percentile" : 12680.0,
    "slowDeleteCount" : 0,
    "slowIncrementCount" : 0,
    "FlushTime_num_ops" : 3,
    "FlushTime_min" : 1239,
    "FlushTime_max" : 21322,
    "FlushTime_mean" : 13931.333333333334,
    "FlushTime_median" : 19233.0,
    "FlushTime_75th_percentile" : 21322.0,
    "FlushTime_95th_percentile" : 21322.0,
    "FlushTime_99th_percentile" : 21322.0,
    "Get_num_ops" : 5372,
    "Get_min" : 0,
    "Get_max" : 47,
    "Get_mean" : 0.22412509307520476,
    "Get_median" : 0.0,
    "Get_75th_percentile" : 0.0,
    "Get_95th_percentile" : 1.0,
    "Get_99th_percentile" : 1.0,
    "Replay_num_ops" : 0,
    "Replay_min" : 0,
    "Replay_max" : 0,
    "Replay_mean" : 0.0,
    "Replay_median" : 0.0,
    "Replay_75th_percentile" : 0.0,
    "Replay_95th_percentile" : 0.0,
    "Replay_99th_percentile" : 0.0,
    "slowGetCount" : 0,
    "slowAppendCount" : 0,
    "slowPutCount" : 0,
    "SplitTime_num_ops" : 1,
    "SplitTime_min" : 4334,
    "SplitTime_max" : 4334,
    "SplitTime_mean" : 4334.0,
    "SplitTime_median" : 4334.0,
    "SplitTime_75th_percentile" : 4334.0,
    "SplitTime_95th_percentile" : 4334.0,
    "SplitTime_99th_percentile" : 4334.0,
    "Increment_num_ops" : 0,
    "Increment_min" : 0,
    "Increment_max" : 0,
    "Increment_mean" : 0.0,
    "Increment_median" : 0.0,
    "Increment_75th_percentile" : 0.0,
    "Increment_95th_percentile" : 0.0,
    "Increment_99th_percentile" : 0.0
  } ]
}
2016-10-07 03:01:03,638 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: regionserver:16020-0x3572d7e3c28020d, quorum=imp1tvhdpmst2.corp.test.com:2181,imp1tvhdpmst3.corp.test.com:2181,imp1tvhdpmst4.corp.test.com:2181, baseZNode=/hbase-secure regionserver:16020-0x3572d7e3c28020d received expired from ZooKeeper, aborting
2016-10-07 03:01:03,638 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,638 WARN  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] util.Sleeper: We slept 62101ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-10-07 03:01:03,639 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue0] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch, finalDestination=hbaseRegional.async.summary.multi_dest.batch.hdfs, interval=01:03.866 minutes, events=1, totalEvents=585, totalSuccessCount=250
2016-10-07 03:01:03,639 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch, finalDestination=hbaseRegional.async.summary.multi_dest.batch.solr, interval=01:03.860 minutes, events=1, totalEvents=552, totalSuccessCount=250
2016-10-07 03:01:03,639 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2016-10-07 03:01:03,640 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: Stopping infoServer
2016-10-07 03:01:03,640 INFO  [SplitLogWorker-imp1tvhdpslv1:16020] regionserver.SplitLogWorker: SplitLogWorker interrupted. Exiting. 
2016-10-07 03:01:03,640 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch.solr, interval=08:47:58.911 hours, events=2, succcessCount=2, totalEvents=250, totalSuccessCount=250
2016-10-07 03:01:03,640 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue0] provider.BaseAuditHandler: Audit Status Log: name=hbaseRegional.async.summary.multi_dest.batch.hdfs, interval=08:47:58.893 hours, events=2, succcessCount=2, totalEvents=250, totalSuccessCount=250
2016-10-07 03:01:03,640 INFO  [SplitLogWorker-imp1tvhdpslv1:16020] regionserver.SplitLogWorker: SplitLogWorker imp1tvhdpslv1.corp.test.com,16020,1475152617424 exiting
2016-10-07 03:01:03,642 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16030
2016-10-07 03:01:03,743 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HeapMemoryManager: Stoping HeapMemoryTuner chore.
2016-10-07 03:01:03,744 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager abruptly.
2016-10-07 03:01:03,744 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2016-10-07 03:01:03,744 INFO  [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2016-10-07 03:01:03,744 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] flush.RegionServerFlushTableProcedureManager: Stopping region server flush procedure manager abruptly.
2016-10-07 03:01:03,749 INFO  [StoreCloserThread-ambarismoketest,,1466187475382.7ee7f2de55e70e37f2fb23624f9e593e.-1] regionserver.HStore: Closed family
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: aborting server imp1tvhdpslv1.corp.test.com,16020,1475152617424
2016-10-07 03:01:03,750 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed ambarismoketest,,1466187475382.7ee7f2de55e70e37f2fb23624f9e593e.
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x2572d7e364901d2
2016-10-07 03:01:03,750 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2016-10-07 03:01:03,752 INFO  [StoreCloserThread-du_secure_vault_tst_jcb,,1475653102504.d6e8b646cabc9a73fee39da3f71afcc9.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,752 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,contract_main_resource_num_token_99445B2,1473230358405.57a88265f314ba4a9ff713deef241c37.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,753 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server imp1tvhdpmst4.corp.test.com/172.24.125.133:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2016-10-07 03:01:03,753 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault_tst_jcb,,1475653102504.d6e8b646cabc9a73fee39da3f71afcc9.
2016-10-07 03:01:03,753 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-SendThread(imp1tvhdpmst4.corp.test.com:2181)] zookeeper.ClientCnxn: Socket connection established to imp1tvhdpmst4.corp.test.com/172.24.125.133:2181, initiating session
2016-10-07 03:01:03,755 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,contract_main_resource_num_token_99445B2,1473230358405.57a88265f314ba4a9ff713deef241c37.
2016-10-07 03:01:03,755 INFO  [StoreCloserThread-sun_test,,1464509020983.12c9c4b8684478c63917f0253bf930b8.-1] regionserver.HStore: Closed cf
2016-10-07 03:01:03,755 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed sun_test,,1464509020983.12c9c4b8684478c63917f0253bf930b8.
2016-10-07 03:01:03,757 INFO  [StoreCloserThread-du_secure_vault_tst_may,,1475763178137.0d211090290d0cac9aceb17d29c2e05a.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,757 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_tst_may,,1475763178137.0d211090290d0cac9aceb17d29c2e05a.
2016-10-07 03:01:03,758 INFO  [StoreCloserThread-du_hbase_krb_tst,,1465128507241.11f2c36247ad6f3348ca4013eb99daad.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,758 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,msisdn_raw_key_97152680792,1472729475904.00d916acbfa9044e067b9b6b9ec0e601.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,759 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_hbase_krb_tst,,1465128507241.11f2c36247ad6f3348ca4013eb99daad.
2016-10-07 03:01:03,759 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,msisdn_raw_key_97152680792,1472729475904.00d916acbfa9044e067b9b6b9ec0e601.
2016-10-07 03:01:03,762 INFO  [StoreCloserThread-du_secure_vault_onedaydata,callingnormalized_key_97155790835,1470233185954.293f370188569fa959fe704c6eca64e8.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,762 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata,callingnormalized_key_97155790835,1470233185954.293f370188569fa959fe704c6eca64e8.
2016-10-07 03:01:03,763 INFO  [StoreCloserThread-du_secure_vault_onedaydata,contract_storage_medium_num_token_849867732,1470235487700.89859097282ea3fb2868bc90ded9653a.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,764 INFO  [StoreCloserThread-du_secure_vault_onedaydata,imsi_key_42403021011613,1468488066776.0c2171e0aaba28588914a58f5f70152e.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,764 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault_onedaydata,contract_storage_medium_num_token_849867732,1470235487700.89859097282ea3fb2868bc90ded9653a.
2016-10-07 03:01:03,765 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-2] regionserver.HRegion: Closed du_secure_vault_onedaydata,imsi_key_42403021011613,1468488066776.0c2171e0aaba28588914a58f5f70152e.
2016-10-07 03:01:03,767 INFO  [StoreCloserThread-du_secure_vault_onedaydata_dq,cust_id_token_5568245463,1474068738847.8110480ef3807a8e15f7508a979f6fe8.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,767 INFO  [StoreCloserThread-du_secure_vault,,1465883027720.d506c1c78844ea16909341ca618e18e3.-1] regionserver.HStore: Closed vault
2016-10-07 03:01:03,767 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-0] regionserver.HRegion: Closed du_secure_vault_onedaydata_dq,cust_id_token_5568245463,1474068738847.8110480ef3807a8e15f7508a979f6fe8.
2016-10-07 03:01:03,768 INFO  [RS_CLOSE_REGION-imp1tvhdpslv1:16020-1] regionserver.HRegion: Closed du_secure_vault,,1465883027720.d506c1c78844ea16909341ca618e18e3.
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] zookeeper.ZooKeeper: Session: 0x2572d7e364901d2 closed
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2016-10-07 03:01:03,857 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.HRegionServer: stopping server imp1tvhdpslv1.corp.test.com,16020,1475152617424; all regions closed.
2016-10-07 03:01:03,895 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.Leases: regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020 closing leases
2016-10-07 03:01:03,895 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.Leases: regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020 closed leases
2016-10-07 03:01:03,896 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] hbase.ChoreService: Chore service for: imp1tvhdpslv1.corp.test.com,16020,1475152617424 had [[ScheduledChore: Name: MovedRegionsCleaner for region imp1tvhdpslv1.corp.test.com,16020,1475152617424 Period: 120000 Unit: MILLISECONDS], [ScheduledChore: Name: imp1tvhdpslv1.corp.test.com,16020,1475152617424-MemstoreFlusherChore Period: 10000 Unit: MILLISECONDS]] on shutdown
2016-10-07 03:01:08,678 INFO  [RS_OPEN_META-imp1tvhdpslv1:16020-0-MetaLogRoller] regionserver.LogRoller: LogRoller exiting.
2016-10-07 03:01:08,679 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020.logRoller] regionserver.LogRoller: LogRoller exiting.
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2016-10-07 03:01:08,680 INFO  [regionserver/imp1tvhdpslv1.corp.test.com/172.24.125.134:16020] region
1 ACCEPTED SOLUTION

avatar
Guru

The regionserver is failing because of this:

  1. 2016-10-0703:01:03,102 WARN [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper:We slept 63817ms instead of 3000ms,thisis likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

Please check your GC settings and tune the GC.

View solution in original post

1 REPLY 1

avatar
Guru

The regionserver is failing because of this:

  1. 2016-10-0703:01:03,102 WARN [master/imp1tvhdpmst1.corp.test.com/172.24.125.130:16000] util.Sleeper:We slept 63817ms instead of 3000ms,thisis likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

Please check your GC settings and tune the GC.