Created on 09-23-2017 01:10 AM - edited 09-16-2022 05:17 AM
hello:
My CM version is 5.12.0.
There 3 zookeeper server instance on machine 10.0.0.8/13/14. The SERVICE MONITOR lies on 10.0.0.8.
1. A Canary error occurs on CM dashboard of zookeeper ; -- sovled by changing the zookeeper max connections to 1000;
2. The Canary error occurs again after a while; then I find the SERVICE MONITOR take up all the 1000 zookeeper connections.
3. I disabled the Zookeeper Canary test, it will not show the error on CM dashboard.
4. When I run HBase shell on 10.0.0.8, it report an error:
[root@inspire-dev-3 bin]# hbase shell 17/09/23 08:03:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 17/09/23 08:03:59 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 17/09/23 08:03:59 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x3202c09d0x0, quorum=inspire-dev-3:2181,inspire-dev-6:2181,inspire-dev-5:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:919) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:657) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
5. I checked the zookeeper connections by "echo cons | nc 127.0.0.1 2181", it shows 1008 connections.
6. I checked who establised the connections to 2181 by "netstat -npt | grep 2181", it shows the SERVICE MONITOR process have all the connections.
7. I restart the service monitor server, and the hbase shell becomes good.
This post says that it's a bug of SERVICE MONITOR, and has been fixed in 5.3.0, but my version is 5.12.0, and it seems the bug still exists.
So, what should I do now? I must restart SERVICE MONITOR from time to time.
Created on 09-24-2017 08:49 PM - edited 09-24-2017 08:56 PM
Does my description unclear?
Can you give me some help? @mbigelow @csguna @saranvisa
Created 09-24-2017 10:32 PM
Created 09-24-2017 10:35 PM
Thanks for reply. Let me check.
Created 09-24-2017 10:39 PM
My settings:
Created on 09-24-2017 10:41 PM - edited 09-24-2017 10:43 PM
Now it's not the Canary problem, because I have closed the Canary test. But the Zookeeper connections still be used up by SM.
Created 09-24-2017 10:46 PM
Created 09-24-2017 10:54 PM
We need to figure out what SM is doing. Either go through the ZK logs to determine what node it is accessing. This may require you to increase the logging level.
OK, increase it to DEBUG level? Now it is WARN level, and the log shows as:
2017-09-25 05:52:44,823 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000 2017-09-25 05:53:02,317 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000 2017-09-25 05:53:05,519 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000 2017-09-25 05:53:09,364 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000 2017-09-25 05:53:12,528 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: Too many connections from /10.0.0.8 - max is 1000
I want to make sure that I got this right. The Canary test is disabled, since being disabled you have restarted the Service Monitor, and since restarting the connections from SM to ZK have climbed until hitting the maximum of 1000. Do I have that right?
Yes, correct. I have restart it several times.
Created 09-24-2017 11:04 PM
Created 09-24-2017 11:25 PM
hello:
No your pointed kind of logs found, there are much number of this kind of logs, please check if it is useful:
2017-09-25 06:22:40,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:41,739 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400002 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,193 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15eb7a1b1400001 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.quorum.CommitProcessor: Processing request:: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: Processing request:: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a 2017-09-25 06:22:42,394 DEBUG org.apache.zookeeper.server.FinalRequestProcessor: sessionid:0x15e7e48142b2eb2 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
Is it doing PING to zookeeper?