Support Questions

Find answers, ask questions, and share your expertise

Zookeeper connections are taken up by service monitor

avatar
Explorer

hello:

 

My CM version is 5.12.0.

There 3 zookeeper server instance on machine 10.0.0.8/13/14. The SERVICE MONITOR lies on 10.0.0.8.

1. A Canary error occurs on CM dashboard of zookeeper ; -- sovled by changing the zookeeper max connections to 1000;

2. The Canary error occurs again after a while;  then I find the SERVICE MONITOR take up all the 1000 zookeeper connections. 

3. I disabled the Zookeeper Canary test, it will not show the error on CM dashboard.

4. When I run HBase shell on 10.0.0.8, it report an error:

[root@inspire-dev-3 bin]# hbase shell
17/09/23 08:03:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
17/09/23 08:03:59 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
17/09/23 08:03:59 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x3202c09d0x0, quorum=inspire-dev-3:2181,inspire-dev-6:2181,inspire-dev-5:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:919)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:657)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

5. I checked the zookeeper connections by  "echo cons | nc 127.0.0.1 2181", it shows 1008 connections. 

6. I checked who establised the connections to 2181 by "netstat -npt | grep 2181", it shows the SERVICE MONITOR process have all the connections.

7. I restart the service monitor server, and the hbase shell becomes good.

 

This post says that it's a bug of SERVICE MONITOR, and has been fixed in 5.3.0, but my version is 5.12.0, and it seems the bug still exists.

 

So, what should I do now? I must restart SERVICE MONITOR from time to time.

12 REPLIES 12

avatar
Champion
That appears to be between the ZK servers. Anything useful in the SM logs?

avatar
Explorer

@mbigelow

There are lots of logs like these in SM log:

2017-09-26 01:18:50,715 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x40b71007 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:18:50,733 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x178b2a6f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:19:50,735 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4a38698e connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:19:50,754 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x60599b95 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:20:50,771 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6f656e8a connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:20:50,792 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x22d3d22c connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:21:50,775 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3b64eef7 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:21:50,796 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x333cc0f1 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181

And there are some WARNINGs :

2017-09-26 00:57:50,306 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xf9c174f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspir
e-dev-3:2181,inspire-dev-5:2181
2017-09-26 00:57:50,325 WARN com.cloudera.cmon.firehose.polling.CdhTask: (14 skipped) Exception in doWork for task: hbase_HBASE_SERVICE_STATE_TASK
java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:412)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:405)
        at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:283)
        at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:116)

 

avatar
Explorer

@mbigelow  

Any ideas?