Support Questions
Find answers, ask questions, and share your expertise

Hbase master down

Hbase master down

Contributor

Hi everyone

i have 6 node cluster and HA enabled

one of the hbase master was down. when i was try to restart the hbase master it is coming up with out any issue.

can you please tell me how to solve this.

please find the log details below

2018-07-02 06:13:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238700, evicted=0, evictedPerRun=0.0

2018-07-02 06:18:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238730, evicted=0, evictedPerRun=0.0

2018-07-02 06:23:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238760, evicted=0, evictedPerRun=0.0

2018-07-02 06:28:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238790, evicted=0, evictedPerRun=0.0

2018-07-02 06:33:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238820, evicted=0, evictedPerRun=0.0

2018-07-02 06:38:14,943 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=2.19 MB, freeSize=2.08 GB, max=2.08 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=238850, evicted=0, evictedPerRun=0.0

2018-07-02 06:42:03,335 INFO [master/server3.covert.com/IP:16000-SendThread(server3.covert.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26680ms for sessionid 0x363c8a174c329a7, closing socket connection and attempting reconnect

2018-07-02 06:42:03,469 INFO [main-SendThread(server3.covert.com:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26676ms for sessionid 0x163cb19387303c0, closing socket connection and attempting reconnect

2018-07-02 06:42:03,954 INFO [timeline] timeline.HadoopTimelineMetricsSink: Unable to connect to collector, http://server4.covert.com:6188/ws/v1/timeline/metrics

This exceptions will be ignored for next 100 times

2018-07-02 06:42:03,955 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://server4.covert.com:6188/ws/v1/timeline/metrics

2018-07-02 06:42:04,008 INFO [main-SendThread(server2.covert.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.

2018-07-02 06:42:04,009 INFO [main-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server server2.covert.com/IP1:2181. Will attempt to SASL-authenticate using Login Context section 'Client'

2018-07-02 06:42:04,036 INFO [master/server3.covert.com/IP:16000-SendThread(server2.covert.com:2181)] client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.

2018-07-02 06:42:04,037 INFO [master/server3.covert.com/IP:16000-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server server2.covert.com/IP1:2181. Will attempt to SASL-authenticate using Login Context section 'Client'

2018-07-02 06:42:35,085 WARN [master/server3.covert.com/IP:16000] util.Sleeper: We slept 30621ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2018-07-02 06:42:38,645 INFO [main-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Socket connection established to server2.covert.com/IP1:2181, initiating session

2018-07-02 06:42:38,647 INFO [main-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x163cb19387303c0 has expired, closing socket connection

2018-07-02 06:42:38,648 FATAL [main-EventThread] master.HMaster: master:16000-0x163cb19387303c0, quorum=server3.covert.com:2181,server1.covert.com:2181,server2.covert.com:2181, baseZNode=/hbase-secure master:16000-0x163cb19387303c0 received expired from ZooKeeper, aborting

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired

at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:592)

at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)

at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)

at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)

at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)

2018-07-02 06:42:38,650 INFO [main-EventThread] regionserver.HRegionServer: STOPPED: master:16000-0x163cb19387303c0, quorum=server3.covert.com:2181,server1.covert.com:2181,server2.covert.com:2181, baseZNode=/hbase-secure master:16000-0x163cb19387303c0 received expired from ZooKeeper, aborting

2018-07-02 06:42:38,650 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

2018-07-02 06:42:38,650 INFO [master/server3.covert.com/IP:16000] regionserver.HRegionServer: Stopping infoServer

2018-07-02 06:42:38,663 INFO [master/server3.covert.com/IP:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010

2018-07-02 06:42:38,669 INFO [master/server3.covert.com/IP:16000-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Socket connection established to server2.covert.com/IP1:2181, initiating session

2018-07-02 06:42:38,671 INFO [master/server3.covert.com/IP:16000-SendThread(server2.covert.com:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x363c8a174c329a7 has expired, closing socket connection

2018-07-02 06:42:38,671 WARN [master/server3.covert.com/IP:16000-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired

at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:592)

at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)

at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)

at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)

at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)

2018-07-02 06:42:38,671 INFO [master/server3.covert.com/IP:16000-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x363c8a174c329a7

2018-07-02 06:42:38,671 INFO [master/server3.covert.com/IP:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down

2018-07-02 06:42:38,766 INFO [master/server3.covert.com/IP:16000] regionserver.HRegionServer: stopping server server3.covert.com,16000,1528124894011

2018-07-02 06:42:38,769 INFO [master/server3.covert.com/IP:16000] regionserver.HRegionServer: stopping server server3.covert.com,16000,1528124894011; all regions closed.

2018-07-02 06:42:38,769 INFO [master/server3.covert.com/IP:16000] hbase.ChoreService: Chore service for: server3.covert.com,16000,1528124894011 had [] on shutdown

2018-07-02 06:42:38,769 WARN [master/server3.covert.com/IP:16000] zookeeper.ZKUtil: master:16000-0x163cb19387303c0, quorum=server3.covert.com:2181,server1.covert.com:2181,server2.covert.com:2181, baseZNode=/hbase-secure Unable to get data of znode /hbase-secure/master

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master

at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)

at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)

at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)

at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1249)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)

at java.lang.Thread.run(Thread.java:745)

2018-07-02 06:42:38,770 ERROR [master/server3.covert.com/IP:16000] zookeeper.ZooKeeperWatcher: master:16000-0x163cb19387303c0, quorum=server3.covert.com:2181,server1.covert.com:2181,server2.covert.com:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master

at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)

at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)

at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)

at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1249)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)

at java.lang.Thread.run(Thread.java:745)

2018-07-02 06:42:38,770 ERROR [master/server3.covert.com/IP:16000] master.ActiveMasterManager: master:16000-0x163cb19387303c0, quorum=server3.covert.com:2181,server1.covert.com:2181,server2.covert.com:2181, baseZNode=/hbase-secure Error deleting our own master address node

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/master

at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)

at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)

at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)

at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1249)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)

at java.lang.Thread.run(Thread.java:745)

2018-07-02 06:42:38,770 INFO [master/server3.covert.com/IP:16000] ipc.RpcServer: Stopping server on 16000

2018-07-02 06:42:38,770 INFO [master/server3.covert.com/IP:16000] token.AuthenticationTokenSecretManager: Stopping leader election, because: SecretManager stopping

2018-07-02 06:42:38,770 INFO [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: stopping

2018-07-02 06:42:38,771 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped

2018-07-02 06:42:38,777 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping

2018-07-02 06:42:38,787 WARN [master/server3.covert.com/IP:16000] regionserver.HRegionServer: Failed deleting my ephemeral node

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-secure/rs/server3.covert.com,16000,1528124894011

at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1222)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1211)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1528)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1126)

at java.lang.Thread.run(Thread.java:745)

2018-07-02 06:42:38,791 INFO [master/server3.covert.com/IP:16000] regionserver.HRegionServer: stopping server server3.covert.com,16000,1528124894011; zookeeper connection closed.

2018-07-02 06:42:38,791 INFO [master/server3.covert.com/IP:16000] regionserver.HRegionServer: master/server3.covert.com/IP:16000 exiting

1 REPLY 1

Re: Hbase master down

Contributor

@kanna k

Please check whether zookeeper is up and working or not.

Check the zookeeper logs when the issue is observed.

If possible, after masking confidential information can you send zookeeper log info when the issue is observed.