Our cluster recently started having a problem with one of our ZooKeeper servers - this server will consistently refuse any connections made to ZooKeeper, which has lead to some problems in our cluster. This is not a network issue, as all other connections are successful. Our investigation has identified that ZooKeeper CPU usage on this server is absurdly high - ranging from 300% to 2200% CPU usage. For comparison, the other servers in the Quorum rarely show ZooKeeper in top at all, and when the do the CPU usage is <1%.
A misconfiguration of this ZooKeeper server seems unlikely, since the cluster is managed through Ambari - all ZooKeeper servers should have exactly the same configurations. We have restarted ZooKeeper on this machine multiple times with no improvement. Even restarting the physical host did not cause any improvement.
We are having authentication issues on that machine which may contribute to the issue. However the zookeeper user is accessible, and all commands through Ambari are successful, with no permission denied errors.
Some possibly relevant information:
We are running HDP-188.8.131.52, with ZooKeeper 3.4.6
Our Quorum size is three
Does anybody have any suggestions for how this can be improved or resolved?