Canary test of client connection to ZooKeeper and execution of basic operations succeeded though a session could not be established with one or more servers. Need help to know what to do in such situation?
Also, when I check logs, I see the following error, can you help me in fixing this ?
Session 0x0 for server xxxxxxxxxxxxxx/10.4.2.110:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
We are using the zookeeper server on our google compute engine VMs. When we go to Cloudera manager, the status of Zookeeper says
that "ZooKeeper service zookeeper must have an odd number of servers." Obviously we thought of shutting down one of the servers.
Even when we stopper Zookeeper server on one of our nodes, the warning was still there. After this I have tried stopping, restarting and re-initializing the server but the warning message remains. Does anyone have any insight into this problem?
To make things worse, after I shut down the chosen node, I started getting this health warning message about that node. Anyway, when I try restarting the zookeeper server on that machine, I get the following error message: "
|Restart this Server||Server, hadp-inv-ibi-w3||Finished||Oct 29, 2015 10:23:24 PM UTC||Oct 29, 2015 10:23:36 PM UTC|
Command (2133) has failed
|Start this Server||Server, hadp-inv-ibi-w3||Finished||Oct 29, 2015 10:23:24 PM UTC||Oct 29, 2015 10:23:36 PM UTC|
Supervisor returned FATAL. Please check the role log file, stderr, or stdout.Program: zookeeper/zkserver.sh ["1","/var/lib/zookeeper"]
Any help would be hugely appreciated.
we have 4 zookeeper servers in the cluster. One of my hypotheses was stopping the zookeeper server on one of the 4 servers will solve the problem.
Since that's not working, what else can we do? Should we try uninstalling zookeeper on one of the nodes?...If so, could you please tell me how to do
that? I have been searching for a couple of days now for resources that would direct me to that direction. so far no luck.
if you need further info on our cluster, here's a screenshot for it: https://dl.dropboxusercontent.com/u/133690147/Capture.PNG
when i try to restart the w3 server (the server at the bottom of the picture above), I get the following error: https://dl.dropboxusercontent.com/u/133690147/Capture2.PNG