I am getting the below two alerts in my ZooKeeper.
1) Bad: The ZooKeeper service canary failed for an unknown reason
2) Bad: Quorum Membership status could not be detected for the last 3 minutes. The last connection attempt to the ZooKeeper server to determine the quorum membership status failed.
ZooKeeper is not down I have verified that. It is working as expected. I have only 1 zookeeper instance in my cluster of 1 Master,3 Data,1 Edge
I am seeing below error messages in the log.
Hi @Seeker90 ,
The ERROR message that you see is because you are running ZK in standalone mode. This is more of a warning than ERROR.
Invalid configuration, only one server specified (ignoring)
Further i see the ZK started properly, However while reading snapshots it throws Exception.
Probable causes of Canary test failure & ZooKeeper Quorum:
1. Max Client Connections is set too low -
2. Long fsyncs (disk writes) -
3. Insufficient heap (long GCs) -
Try below :-
1. Increasing ZK heap size (maybe undersized heap or if size of snapshots is huge, increasing heap would be good starting point )
2. Increase maximum number of connection to 300
3. grep for "fsync" in ZK logs. Check if ZK disk is independent.
Does that answers your questions. Do let us know.
Zookeepers by default are supposed to be running in an ensemble [french together] thats a quorum in English.
Minimum number of servers required to run the Zookeeper is called Quorum.
Zookeeper replicates whole data tree to all the quorum servers. This number is also the minimum number of servers required to store a client’s data before telling the client it is safely stored.
Quorum size should be calculated by Majority Rule
Majority rule: QN = (N + 1) / 2
QN: Minimum number of servers in quorum
N: Total number of servers. (should be an odd number)
So, if we have 5 servers, then Quorum should be minimum of 3 servers.
As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines
ZooKeeper uses Quorums by default to prevent the "brain split" phenomenon. That is, only more than half of the nodes in the cluster can vote for the Leader. This way can ensure the uniqueness of the leader,
Split-Brain is like when you have two nodes in your cluster, they both know that a master needs to be elected in this cluster. Then when there is no problem in the communication between the two of them, a consensus will be reached and one of them will be selected as the master. But if there is a problem with the communication between them, both nodes will feel that there is no master now, so each elects itself as the master, so there will be two masters in the cluster.
So to avoid the current problem you are facing add 2 more Zk'ers and that problem will vanish