Created on 06-12-2018 06:29 AM - edited 08-17-2019 07:29 PM
in the zoo logs we get the following warning :
we also get disconnecting issues from zookeper , is it possible to increase the "60" value?
what should be the ressolution for this ?
Too many connections from /10.54.23.11 - max is 60
from zoo log:
2018-06-12 06:31:56,901 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory$1@44] - Thread Thread[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181,5,main] died java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:934) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745
2018-06-12 06:18:16,687 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.164.46.204:52745 which had sessionid 0x163f299c1e20003 2018-06-12 06:18:16,687 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x163f299c1e20003 2018-06-12 06:18:16,688 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] - Established session 0x163f299c1e20003 with negotiated timeout 40000 for client /10.164.46.204:52835 2018-06-12 06:18:16,878 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:17,550 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.164.46.170:42523 2018-06-12 06:18:17,550 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861] - Client attempting to renew session 0x163f299c1e20004 at /10.54.23.12:42523 2018-06-12 06:18:17,550 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.164.46.170:42495 which had sessionid 0x163f299c1e20004 2018-06-12 06:18:17,551 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x163f299c1e20004 2018-06-12 06:18:17,552 - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@617] - Established session 0x163f299c1e20004 with negotiated timeout 40000 for client /10.164.46.170:42523 2018-06-12 06:18:18,296 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:19,695 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:21,042 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:21,761 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:22,408 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:23,561 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 2018-06-12 06:18:24,192 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /10.54.23.11 - max is 60 ^C [root@master01 zookeeper]# host 10.54.23.11 169.46.164.10.in-addr.arpa domain name pointer master01.sys76.com.
Created 06-12-2018 07:01 AM
Created 06-12-2018 07:55 AM
on master01 we have 172 conections , and master02/03 we have ~50
Created 06-12-2018 08:18 AM
The document I referenced should give you the steps to follow like analyzing the offending application etc if its the same cluster then it could be the Namenode:
netstat -nape | awk ‘{if($5 ==“IR_of_amster01:2181”)print $4, $9;}’
Do the same for master03, then maybe use a bash script to kill the dead processes
Created 06-12-2018 10:09 AM
do you mean to kill ( TIME-WAIT , CLOSE-WAIT )
Created 06-12-2018 10:42 AM
we kill the CLOSE-WAIT and restart the zoo , but still zoo connection are more then max , what we can do with this ?
Created 06-12-2018 05:45 PM
I think this should be linked to the Namenode issues, talking about that I have failed to reproduce the scenario but still investigating
Created 06-12-2018 05:57 PM
Geoffrey , this cluster is diff cluster , not the clkuster with the namenode isshue ,
Created 06-12-2018 06:07 PM
I can kill all CLOSE-WAIT by - lsof -i :2181 | grep CLOSE_WAIT| awk '{print $2}' |uniq| xargs kill , but this isnt solution , why zoo not close them ?
Created 06-12-2018 06:50 PM
see also the following details ( sent much less then Received ) , is it can tell us something about the problem ?
this happend only on the first zoo server
echo stat | nc 10414.42.169 2181
Latency min/avg/max: 0/10/2727 Received: 600879 Sent: 103803 Connections: 30 Outstanding: 546 Zxid: 0x3e000048c3 Mode: follower Node count: 43296