Support Questions
Find answers, ask questions, and share your expertise

zookeper - Too many connections

in the zoo logs we get the following warning :

we also get disconnecting issues from zookeper , is it possible to increase the "60" value?

what should be the ressolution for this ?

Too many connections from /10.54.23.11 - max is 60



76628-capture.png

from zoo log:

2018-06-12 06:31:56,901 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory$1@44] - Thread Thread[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181,5,main] died
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:934)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745
2018-06-12 06:18:16,687 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.164.46.204:52745 which had sessionid 0x163f299c1e20003
2018-06-12 06:18:16,687 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x163f299c1e20003
2018-06-12 06:18:16,688 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] - Established session 0x163f299c1e20003 with negotiated timeout 40000 for client /10.164.46.204:52835
2018-06-12 06:18:16,878 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:17,550 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.164.46.170:42523
2018-06-12 06:18:17,550 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x163f299c1e20004 at /10.54.23.12:42523
2018-06-12 06:18:17,550 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.164.46.170:42495 which had sessionid 0x163f299c1e20004
2018-06-12 06:18:17,551 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x163f299c1e20004
2018-06-12 06:18:17,552 - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] - Established session 0x163f299c1e20004 with negotiated timeout 40000 for client /10.164.46.170:42523
2018-06-12 06:18:18,296 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:19,695 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:21,042 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:21,761 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:22,408 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:23,561 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
2018-06-12 06:18:24,192 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60
^C
[root@master01 zookeeper]# host 10.54.23.11
169.46.164.10.in-addr.arpa domain name pointer master01.sys76.com.
Michael-Bronson
10 REPLIES 10

Mentor

@Michael Bronson

Find zookeeper connections

$ ss -anop | grep 2181 | wc -l 

Look at this HCC document

on master01 we have 172 conections , and master02/03 we have ~50

Michael-Bronson

Mentor

@Michael Bronson

The document I referenced should give you the steps to follow like analyzing the offending application etc if its the same cluster then it could be the Namenode:


netstat -nape | awk ‘{if($5 ==“IR_of_amster01:2181”)print $4, $9;}’

Do the same for master03, then maybe use a bash script to kill the dead processes

do you mean to kill ( TIME-WAIT , CLOSE-WAIT )

Michael-Bronson

we kill the CLOSE-WAIT and restart the zoo , but still zoo connection are more then max , what we can do with this ?

Michael-Bronson

Mentor

@Michael Bronson

I think this should be linked to the Namenode issues, talking about that I have failed to reproduce the scenario but still investigating


Geoffrey , this cluster is diff cluster , not the clkuster with the namenode isshue ,

Michael-Bronson

I can kill all CLOSE-WAIT by - lsof -i :2181 | grep CLOSE_WAIT| awk '{print $2}' |uniq| xargs kill , but this isnt solution , why zoo not close them ?

Michael-Bronson

see also the following details ( sent much less then Received ) , is it can tell us something about the problem ?

this happend only on the first zoo server

echo stat | nc 10414.42.169 2181

Latency min/avg/max: 0/10/2727
Received: 600879
Sent: 103803
Connections: 30
Outstanding: 546
Zxid: 0x3e000048c3
Mode: follower
Node count: 43296
Michael-Bronson