Support Questions
Find answers, ask questions, and share your expertise

Too many connections on zookeper server

Environment: HDP 2.6.4

Ambari – 2.6.1

3 zookeeper server



hi all,


In the first zookeeper server it seems that even after closing the connection to zookeeper is not getting closed,

which causes the maximum number of client connections to be reached from a host - we have maxClientCnxns as 60 in zookeeper config

As a result when a new application comes and tries to create a connection it fails.

Example when Connections are:

echo stat | nc 23.1.35.185 2181 

Latency min/avg/max: 0/71/399


Received: 3031 Sent: 2407

 Connections: 67 

Outstanding: 622 

Zxid: 0x130000004d 

Mode: follower 

Node count: 3730

But after some time when connection comes to ~70 we see

echo stat | nc 23.1.35.185 2181

Ncat: Connection reset by peer.

And We can see also many CLOSE_WAIT

java      58936       zookeeper   60u  IPv6 381963738      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44983 (CLOSE_WAIT)
java      58936       zookeeper   61u  IPv6 381963798      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:45034 (CLOSE_WAIT)
java      58936       zookeeper   62u  IPv6 381963667      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44956 (CLOSE_WAIT)
java      58936       zookeeper   63u  IPv6 381949363      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44911 (CLOSE_WAIT)
java      58936       zookeeper   64u  IPv6 381964358      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44957 (CLOSE_WAIT)
java      58936       zookeeper   65u  IPv6 381963638      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44940 (CLOSE_WAIT)
java      58936       zookeeper   66u  IPv6 381963640      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44943 (CLOSE_WAIT)
java      58936       zookeeper   67u  IPv6 381963642      0t0  TCP zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44945 (CLOSE_WAIT)


From the zookeeper log

2018-12-26 02:50:46,382 [myid:1]
- WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1]
- WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1]
- WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1]
- WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:47,845 [myid:1]
- WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.185 - max is 602018-12-26 02:50:48,180 [myid:1]
- WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 602018-12-26 02:50:49,035 [myid:1]
- WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193]
- Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:49,375 [myid:1]
- WARN 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many
connections from /23.1.35.187 - max is 60

In the ambari we can see also

Connection failed: [Errno 104] Connection reset
by peer to zookeper_server.sys54.com.:2181

I must to say that this not happening on zookeeper servers 2 and 3

so any hint why the connection are CLOSE_WAIT ?

NOTE - if we increase the maxClientCnxns to 300 , its not help because after some time we get more the 300 connections and then we see from the log

2018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] -  Too many 
connections from /23.1.35.187 - max is 60
Michael-Bronson
8 REPLIES 8

Mentor

@Michael Bronson

The maxClientCnxns property in zoo.cfg is used by the ZooKeeper server to limit incoming connections to the ZooKeeper from a single host. By default, this limit is 60.

When this limit is reached, new connections to the ZooKeeper server from the given host will be immediately dropped. This rate-limiting can be observed in the ZooKeeper log and offending applications can be identified by using network tools like netstat. Changes to maxClientCnxns must be accompanied with a restart of the ZooKeeper server.

Too many connections from /23.1.35.185 - max is 602018-12-26 02:50:46,429 [myid:1] 
Too many connections from /23.1.35.197 - max is 602018-12-26 02:50:46,849 [myid:1] 
Too many connections from /23.1.35.187 - max is 602018-12-26 02:50:47,645 [myid:1] 

Possible solutions

  • What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat
  • Ensure that the configuration setting for the maximum number of client connections is sufficient enough to avoid the loss of connections.
  • update the value of the maxClientCnxns configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file on the zookeeper ensemble.
  • Ensure that you have no system issues with CPU services, memory, disk input/output, or other system resources.
  • Zookeeper is sensitive to NTPD functionality make sure the clock is synchronized in the ensemble.
  • Restart ZooKeepers through Ambari

HTH

we check all your comments and I not see a problem except step 1 , you said "

  • What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat" , can you please suggest how to verify ? ( what actualy we need to look on netstat output )
Michael-Bronson

  • What application is running at that particular time? This can be caused by a bug in user code check the offending application with nestat - need to verify
  • Ensure that the configuration setting for the maximum number of client connections is sufficient enough to avoid the loss of connections. - OK
  • update the value of the maxClientCnxns configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file on the zookeeper ensemble. - OK
  • Ensure that you have no system issues with CPU services, memory, disk input/output, or other system resources. - OK
  • Zookeeper is sensitive to NTPD functionality make sure the clock is synchronized in the ensemble. - OK
  • Restart ZooKeepers through Ambari - NO need since we restart couple times and with the same results
Michael-Bronson

Mentor

@Michael Bronson

Replace the x.x.x.x with your zookeeper IP

netstat -nape | awk '{if($5 =="x.x.x.x:2181")print $4, $9;}'

Please let me know

HTH

I get that ( after restart the zookeper service from ambari ) 



netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}'
23.1.35.197:34065 -
23.1.35.197:34071 -
23.1.35.197:34053 -
23.1.35.197:34066 -
23.1.35.197:34068 -
23.1.35.197:34079 63468/java
23.1.35.197:34082 63468/java
23.1.35.197:34052 -
23.1.35.197:34063 -
23.1.35.197:34069 -
23.1.35.197:34075 63468/java
23.1.35.197:34084 63468/java
23.1.35.197:34061 -
23.1.35.197:34078 63468/java


<br>
Michael-Bronson

and after 10min we get that:

netstat -nape | awk '{if($5 =="23.1.35.197:2181")print $4, $9;}'
23.1.35.197:34425 -
23.1.35.197:34416 -
23.1.35.197:34392 -
23.1.35.197:34389 -
23.1.35.197:34462 63468/java
23.1.35.197:34401 -
23.1.35.197:34358 -
23.1.35.197:34437 -
23.1.35.197:34361 -
23.1.35.197:34451 63468/java
23.1.35.197:34354 -
23.1.35.197:34360 -
23.1.35.197:34368 -
23.1.35.197:34444 -
23.1.35.197:34459 63468/java
23.1.35.197:34442 -
23.1.35.197:34391 -
23.1.35.197:34440 -
23.1.35.197:34452 63468/java
Michael-Bronson

we also get this from the zookeper service check

Welcome to ZooKeeper!
JLine support is enabled
[zk:  zookeper_server.sys54.com:2181(CONNECTING) 0] ls /

Command failed after 1 tries
Michael-Bronson

we noticed about the following:

/usr/hdp/2.6.4.0-91/zookeeper/bin/zkCli.sh
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0]   <-- this should be connected not CONNECTING
Michael-Bronson