Created 01-26-2016 12:47 PM
We installed HDP 2.3.4 cluster with Ambari 2.2..
HBase Master and Region servers starts but after some time the HBase Master shuts down.
The log file says:
2016-01-25 14:46:47,340 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/master 2016-01-25 14:46:47,340 ERROR [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts 2016-01-25 14:46:47,340 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.ZKUtil: master:16000-0x3527a1898200012, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, baseZNode=/hbase-unsecure Unable to get data of znode /hbase-unsecure/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:745) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267) at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1164) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071) at java.lang.Thread.run(Thread.java:745) 2016-01-25 14:46:47,340 ERROR [master/node03.test.com/x.x.x.x:16000] zookeeper.ZooKeeperWatcher: master:16000-0x3527a1898200012, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:745) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267) at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1164) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071) at java.lang.Thread.run(Thread.java:745) 2016-01-25 14:46:47,340 ERROR [master/node03.test.com/x.x.x.x:16000] master.ActiveMasterManager: master:16000-0x3527a1898200012, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, baseZNode=/hbase-unsecure Error deleting our own master address node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:745) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267) at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1164) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1071) at java.lang.Thread.run(Thread.java:745) 2016-01-25 14:46:47,341 INFO [master/node03.test.com/x.x.x.x:16000] hbase.ChoreService: Chore service for: node03.test.com,16000,1453750627948_splitLogManager_ had [] on shutdown 2016-01-25 14:46:47,341 INFO [master/node03.test.com/x.x.x.x:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down. 2016-01-25 14:46:47,342 INFO [master/node03.test.com/x.x.x.x:16000] ipc.RpcServer: Stopping server on 16000 2016-01-25 14:46:47,342 INFO [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: stopping 2016-01-25 14:46:47,343 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped 2016-01-25 14:46:47,343 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping 2016-01-25 14:46:47,345 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 2016-01-25 14:46:48,345 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 2016-01-25 14:46:50,345 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 2016-01-25 14:46:54,346 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 2016-01-25 14:47:02,346 WARN [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node03.test.com:2181,node02.test.com:2181,node01.test.com:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 2016-01-25 14:47:02,346 ERROR [master/node03.test.com/x.x.x.x:16000] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts 2016-01-25 14:47:02,347 WARN [master/node03.test.com/x.x.x.x:16000] regionserver.HRegionServer: Failed deleting my ephemeral node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/node03.test.com,16000,1453750627948 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1345) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1334) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1403) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1079) at java.lang.Thread.run(Thread.java:745) 2016-01-25 14:47:02,350 INFO [master/node03.test.com/x.x.x.x:16000] regionserver.HRegionServer: stopping server node03.test.com,16000,1453750627948; zookeeper connection closed. 2016-01-25 14:47:02,351 INFO [master/node03.test.com/x.x.x.x:16000] regionserver.HRegionServer: master/node03.test.com/x.x.x.x:16000 exiting
What steps do I take to solve this?
Created on 01-26-2016 06:53 PM - edited 08-19-2019 04:14 AM
ZooKeeper Session expirations (e.g. "org.apache.zookeeper.KeeperException$SessionExpiredException:KeeperErrorCode=Session expired for/hbase-unsecure/rs/node03.test.com,16000,1453750627948") are a common issue that HBase can run into, although there are often a number of reasons which could cause this error.
Background: ZooKeeper clients need to maintain a "heartbeat" (a regular RPC to a ZooKeeper server) to maintain their session (credits to Apache ZooKeeper: http://zookeeper.apache.org/doc/r3.4.6/images/state_dia.jpg)
Without an active Session, clients cannot interact with ZooKeeper. In the HBase context, this means that HBase will continually try to connect to the server to put whatever state is necessary.
Some common reasons that this heartbeat fails:
While removing the contents of HBase's root znode can be a temporary fix (especially with older versions which heavily rely on ZooKeeper for region assignments), this is often indicative of a much larger problem which will continue to occur in the future.
Created on 01-26-2016 06:53 PM - edited 08-19-2019 04:14 AM
ZooKeeper Session expirations (e.g. "org.apache.zookeeper.KeeperException$SessionExpiredException:KeeperErrorCode=Session expired for/hbase-unsecure/rs/node03.test.com,16000,1453750627948") are a common issue that HBase can run into, although there are often a number of reasons which could cause this error.
Background: ZooKeeper clients need to maintain a "heartbeat" (a regular RPC to a ZooKeeper server) to maintain their session (credits to Apache ZooKeeper: http://zookeeper.apache.org/doc/r3.4.6/images/state_dia.jpg)
Without an active Session, clients cannot interact with ZooKeeper. In the HBase context, this means that HBase will continually try to connect to the server to put whatever state is necessary.
Some common reasons that this heartbeat fails:
While removing the contents of HBase's root znode can be a temporary fix (especially with older versions which heavily rely on ZooKeeper for region assignments), this is often indicative of a much larger problem which will continue to occur in the future.
Created 11-14-2017 08:18 AM
On my cluster NiFi and one of three HBase Region Server run on same server. I modified NiFi boostrap.conf file and uncommented java.arg.13=-XX:+UseG1GC then Region Server stopped. I tried many times to restart, once it started soon it stopped again till I commented out java.arg.13=-XX:+UseG1GC property. It now works. I think the property changes the server's JVM garbage collections style.
Created 05-29-2017 07:02 AM
Setting timeouts from HBase conf did not work for me. tickTime in ZK was getting picked for session. Here's more info: https://superuser.blog/hbase-dead-regionserver/
Created 11-08-2017 09:11 PM
In my cloudera cluster, I met this issue. I restarted zookeeper, HBASE service. And it is working.