Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Zookeeper connections are taken up by service monitor

avatar
Explorer

hello:

 

My CM version is 5.12.0.

There 3 zookeeper server instance on machine 10.0.0.8/13/14. The SERVICE MONITOR lies on 10.0.0.8.

1. A Canary error occurs on CM dashboard of zookeeper ; -- sovled by changing the zookeeper max connections to 1000;

2. The Canary error occurs again after a while;  then I find the SERVICE MONITOR take up all the 1000 zookeeper connections. 

3. I disabled the Zookeeper Canary test, it will not show the error on CM dashboard.

4. When I run HBase shell on 10.0.0.8, it report an error:

[root@inspire-dev-3 bin]# hbase shell
17/09/23 08:03:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
17/09/23 08:03:59 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
17/09/23 08:03:59 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x3202c09d0x0, quorum=inspire-dev-3:2181,inspire-dev-6:2181,inspire-dev-5:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:919)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:657)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

5. I checked the zookeeper connections by  "echo cons | nc 127.0.0.1 2181", it shows 1008 connections. 

6. I checked who establised the connections to 2181 by "netstat -npt | grep 2181", it shows the SERVICE MONITOR process have all the connections.

7. I restart the service monitor server, and the hbase shell becomes good.

 

This post says that it's a bug of SERVICE MONITOR, and has been fixed in 5.3.0, but my version is 5.12.0, and it seems the bug still exists.

 

So, what should I do now? I must restart SERVICE MONITOR from time to time.

12 REPLIES 12

avatar
Explorer

Does my description unclear? 

Can you give me some help? @mbigelow @csguna @saranvisa

avatar
Champion
I have never seen such and issues with ZK and SM. What do you have set for the SM timeout values in ZK. I may have not seen this as I have these set. This alone should prevent SM from consuming all of the connections to ZK.

ZooKeeper Canary Connection Timeout
ZooKeeper Canary Session Timeout
ZooKeeper Canary Operation Timeout

avatar
Explorer

Thanks for reply. Let me check.

avatar
Explorer

My settings:

ZooKeeper Canary Connection Timeout  : 10 seconds
ZooKeeper Canary Session Timeout       : 30 seconds
ZooKeeper Canary Operation Timeout    : 30 seconds
 
How many should I set to?  

avatar
Explorer

Now it's not the Canary problem, because I have closed the Canary test.  But the Zookeeper connections still be used up by SM.

avatar
Champion
Those are the defaults and should be good.

Hmm, the Service Monitor is what is running the canary test and the likely source of the connections.

We need to figure out what SM is doing. Either go through the ZK logs to determine what node it is accessing. This may require you to increase the logging level.

I want to make sure that I got this right. The Canary test is disabled, since being disabled you have restarted the Service Monitor, and since restarting the connections from SM to ZK have climbed until hitting the maximum of 1000. Do I have that right?

avatar
Explorer

 

We need to figure out what SM is doing. Either go through the ZK logs to determine what node it is accessing. This may require you to increase the logging level.

OK, increase it to DEBUG level? Now it is WARN level, and the log shows as:

2017-09-25 05:52:44,823 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000
2017-09-25 05:53:02,317 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000
2017-09-25 05:53:05,519 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000
2017-09-25 05:53:09,364 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000
2017-09-25 05:53:12,528 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000

 

I want to make sure that I got this right. The Canary test is disabled, since being disabled you have restarted the Service Monitor, and since restarting the connections from SM to ZK have climbed until hitting the maximum of 1000. Do I have that right?

Yes, correct. I have restart it several times.

avatar
Champion
Looking for something that shows which znode it is accessing as that will give us a clue as to what it is doing. I would try INFO before DEBUG but you may have to go to that level. Also, try the SM logs since it is the client and may have the info there.

As an example the below line shows that the HBase client was trying to access the znode /hbase/hbaseid
This is what I am thinking should be in the logs to help us figure this out.

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

avatar
Explorer

hello:

No your pointed kind of logs found, there are much number of this kind of logs, please check if it is useful:

 

2017-09-25 06:22:40,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a

Is it doing PING to zookeeper?

 

avatar
Champion
That appears to be between the ZK servers. Anything useful in the SM logs?

avatar
Explorer

@mbigelow

There are lots of logs like these in SM log:

2017-09-26 01:18:50,715 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x40b71007 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:18:50,733 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x178b2a6f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:19:50,735 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4a38698e connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:19:50,754 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x60599b95 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:20:50,771 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6f656e8a connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:20:50,792 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x22d3d22c connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:21:50,775 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3b64eef7 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181
2017-09-26 01:21:50,796 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x333cc0f1 connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspire-dev-3:2181,inspire-dev-5:2181

And there are some WARNINGs :

2017-09-26 00:57:50,306 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xf9c174f connecting to ZooKeeper ensemble=inspire-dev-6:2181,inspir
e-dev-3:2181,inspire-dev-5:2181
2017-09-26 00:57:50,325 WARN com.cloudera.cmon.firehose.polling.CdhTask: (14 skipped) Exception in doWork for task: hbase_HBASE_SERVICE_STATE_TASK
java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:412)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:405)
        at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:283)
        at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:116)

 

avatar
Explorer

@mbigelow  

Any ideas? 

Labels