Support Questions

Find answers, ask questions, and share your expertise

hbase region server going down

avatar

I am frequently seeing the below message in the region server logs and the particular region server goes down. Is there any particular reason for that

2016-10-12 07:24:51,105 WARN [regionserver/hostname/10.107.107.152:16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181,zk2:2181,zk3:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hostname,16020,1475939210143 2016-10-12 07:24:59,105 WARN [regionserver/hostname/10.107.107.152:16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181,zk2:2181,zk3:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hostname,16020,1475939210143 2016-10-12 07:24:59,105 ERROR [regionserver/hostname/10.107.107.152:16020] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts 2016-10-12 07:24:59,105 WARN [regionserver/hostname/10.107.107.152:16020] regionserver.HRegionServer: Failed deleting my ephemeral node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hostname,16020,1475939210143 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1221) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1210) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1403) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1079) at java.lang.Thread.run(Thread.java:745) 2016-10-12 07:24:59,108 INFO [regionserver/hostname/10.107.107.152:16020] regionserver.HRegionServer: stopping server hostname,16020,1475939210143; zookeeper connection closed. 2016-10-12 07:24:59,108 INFO [regionserver/hostname/10.107.107.152:16020] regionserver.HRegionServer: regionserver/hostname/10.107.107.152:16020 exiting 2016-10-12 07:24:59,108 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting java.lang.RuntimeException: HRegionServer Aborted at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)

2 REPLIES 2

avatar
Super Guru

avatar
Super Guru

Check for:

1. JVM GC pauses. If the JVM is doing a stop-the-world garbage collection, it will cause the server to become disconnected from ZK, and likely lose its session. Read the lines in the HBase service log prior to this error.

2. Errors in the ZooKeeper log about maxClientCnxns (https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html)

3. Ensure operation system swappiness is reduced from the default (often 30 or 60), to a value of 0. You can inspect this via `cat /proc/sys/vm/swappiness`.