Support Questions
Find answers, ask questions, and share your expertise

zookeeper server shutdown after some time

we have ambari hadoop cluster version 2.6.4 with 3 zookeeper server version 3.4.x

the first zookeeper server not working as should be and stooped after some time

76647-capture.png

from ambari GUI we can see that zoo disconnected

from the zookeeper log we can see the following:

<code>java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,856 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

and when we do a test for the zookeeper we got that:

<code>echo stat | nc 14.42.169 2181

Latency min/avg/max: 0/10/2727
Received: 600879
Sent: 103803
Connections: 30
Outstanding: 546
Zxid: 0x3e000048c3
Mode: follower
Node count: 43296
  • note that send is much less then we got from Received!

and we can see that many CLOSE-WAIT connections

<code>#  ss -anop | grep 2181 | grep CLOSE | awk '{print $1" "$2}' | more
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT

in order to try to resolve this issue we performed the following but without success

  1. increase Java heap size to 8G ( only on zookeeper )
  2. increase zookeeper.session.timeout.ms on kafka

but all these not help us

please advice what could be the reason for this issue ,

<br>
Michael-Bronson
0 REPLIES 0