Created 12-31-2018 01:51 PM
Environment: HDP 2.6.4
Ambari – 2.6.1
3 zookeeper server
hi all,
In the first zookeeper server it seems that even after closing the connection to zookeeper is not getting closed,
which causes the maximum number of client connections to be reached from a host - we have maxClientCnxns as 60 in zookeeper config
As a result when a new application comes and tries to create a connection it fails.
Example when Connections are:
echo stat | nc 23.1.35.185 2181 Latency min/avg/max: 0/71/399 Received: 3031 Sent: 2407 Connections: 67 Outstanding: 622 Zxid: 0x130000004d Mode: follower Node count: 3730
But after some time when connection comes to ~70 we see
echo stat | nc 23.1.35.185 2181 Ncat: Connection reset by peer.
And We can see also many CLOSE_WAIT
java 58936 zookeeper 60u IPv6 381963738 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44983 (CLOSE_WAIT) java 58936 zookeeper 61u IPv6 381963798 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:45034 (CLOSE_WAIT) java 58936 zookeeper 62u IPv6 381963667 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44956 (CLOSE_WAIT) java 58936 zookeeper 63u IPv6 381949363 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44911 (CLOSE_WAIT) java 58936 zookeeper 64u IPv6 381964358 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44957 (CLOSE_WAIT) java 58936 zookeeper 65u IPv6 381963638 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44940 (CLOSE_WAIT) java 58936 zookeeper 66u IPv6 381963640 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44943 (CLOSE_WAIT) java 58936 zookeeper 67u IPv6 381963642 0t0 TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44945 (CLOSE_WAIT)
From the zookeeper log
2018-12-26 02:50:46,382 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:47,845 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:48,180 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:49,035 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 60
In the ambari we can see also
Connection failed: [Errno 104] Connection reset by peer to zookeper_server.sys54.com.:2181
I must to say that this not happening on zookeeper servers 2 and 3
so any hint why the connection are CLOSE_WAIT ?
NOTE - if we increase the maxClientCnxns to 300 , its not help because after some time we get more the 300 connections and then we see from the log
2018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /23.1.35.187 - max is 60
Created 12-31-2018 03:17 PM
The maxClientCnxns property in zoo.cfg is used by the ZooKeeper server to limit incoming connections to the ZooKeeper from a single host. By default, this limit is 60.
When this limit is reached, new connections to the ZooKeeper server from the given host will be immediately dropped. This rate-limiting can be observed in the ZooKeeper log and offending applications can be identified by using network tools like netstat. Changes to maxClientCnxns must be accompanied with a restart of the ZooKeeper server.
Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1] Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1] Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1]
Possible solutions
HTH
Created 12-31-2018 03:51 PM
we check all your comments and I not see a problem except step 1 , you said "
Created 12-31-2018 04:40 PM
Created 12-31-2018 04:58 PM
Replace the x.x.x.x with your zookeeper IP
netstat -nape | awk '{if($5 =="x.x.x.x:2181")print $4, $9;}'
Please let me know
HTH
Created 12-31-2018 05:11 PM
I get that ( after restart the zookeper service from ambari ) netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}' 23.1.35.197:34065 - 23.1.35.197:34071 - 23.1.35.197:34053 - 23.1.35.197:34066 - 23.1.35.197:34068 - 23.1.35.197:34079 63468/java 23.1.35.197:34082 63468/java 23.1.35.197:34052 - 23.1.35.197:34063 - 23.1.35.197:34069 - 23.1.35.197:34075 63468/java 23.1.35.197:34084 63468/java 23.1.35.197:34061 - 23.1.35.197:34078 63468/java <br>
Created 12-31-2018 05:23 PM
and after 10min we get that:
netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}' 23.1.35.197:34425 - 23.1.35.197:34416 - 23.1.35.197:34392 - 23.1.35.197:34389 - 23.1.35.197:34462 63468/java 23.1.35.197:34401 - 23.1.35.197:34358 - 23.1.35.197:34437 - 23.1.35.197:34361 - 23.1.35.197:34451 63468/java 23.1.35.197:34354 - 23.1.35.197:34360 - 23.1.35.197:34368 - 23.1.35.197:34444 - 23.1.35.197:34459 63468/java 23.1.35.197:34442 - 23.1.35.197:34391 - 23.1.35.197:34440 - 23.1.35.197:34452 63468/java
Created 12-31-2018 06:05 PM
we also get this from the zookeper service check
Welcome to ZooKeeper! JLine support is enabled [zk: zookeper_server.sys54.com:2181(CONNECTING) 0] ls / Command failed after 1 tries
Created 12-31-2018 06:14 PM
we noticed about the following:
/usr/hdp/2.6.4.0-91/zookeeper/bin/zkCli.sh Connecting to localhost:2181 Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181(CONNECTING) 0] <-- this should be connected not CONNECTING