Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase region server crashing

Highlighted

HBase region server crashing

Explorer

In one of our clusters hbase region servers crash with what appears to be a zookeeper error - relevant region server and zookeeper logs below :

region server log :

2018-01-24 10:26:56,669 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=host02.domain.local:2181,host01.domain.local:2181,host04.domain.local:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/host09.domain.local,60020,1516733354773 2018-01-24 10:26:56,669 ERROR [regionserver60020] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts 2018-01-24 10:26:59,316 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/host09.domain.local,60020,1516733354773 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1271) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1260) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1337) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1051) at java.lang.Thread.run(Thread.java:745) 2018-01-24 10:27:02,727 INFO [regionserver60020] regionserver.HRegionServer: stopping server host09.domain.local,60020,1516733354773; zookeeper connection closed. 2018-01-24 10:27:02,727 INFO [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting 2018-01-24 10:27:02,963 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting java.lang.RuntimeException: HRegionServer Aborted

Zookeeper log :

2018-01-24 10:26:41,250 - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /000.00.0.00:57328 (no session established for client) 2018-01-24 10:26:56,654 - ERROR [LearnerHandler-/000.00.0.00:57919:LearnerHandler@633] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546) 2018-01-24 10:26:56,654 - WARN [LearnerHandler-/000.00.0.00:57919:LearnerHandler@646] - ******* GOODBYE /000.00.0.00:57919 ********

Appreciate any insights as to how to resolve this.

1 REPLY 1

Re: HBase region server crashing

Contributor

@ n c

Try this and see if it works

delete the below configuration from hbase-site: - hbase.bucketcache.ioengine - hbase.bucketcache.percentage.in.combinedcache - hbase.bucketcache.size

Don't have an account?
Coming from Hortonworks? Activate your account here