Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Behavior of 3 node Zookeeper quorum when 1 node fails

avatar
Explorer

So I have noticed that operations can still continue when zookeeper fails 1 node of it's 3 node quorum although I understand that it can no longer accept writes because it has no leader (is this right). I know we have to satisfy the ceil(N/2) requirement, but this is to chose leadership it says nothing about how the dependencies (HBase in particular) would be affected.

 

My questions is at it's core, what is the behavior of zookeeper when failure is seen from 3 nodes to 2. Does it turn into a read only coordinator, as an immediate fix can one be assigned master?

 

Thanks.

2 ACCEPTED SOLUTIONS

avatar
Mentor
Loss of one node in a 3-member ZooKeeper quorum is tolerable, because
2 out of 3 remaining machines still count as a majority (out of the
fully identified quorum of 3).

- No loss of functionality will be experienced with the loss of only
one of the peers (in three), two peers (in five), etc.
- One of the remaining two will automatically be assigned the leader
role, in case of a leader failure.
- Writes can continue to happen, and no special mode is invoked.
- Clients will continue to see the same behaviour they expect out of
ZooKeeper, even in such a situation.
- No manual intervention is required for this procedure, ZK is automatically HA.

The administrator's guide of ZooKeeper covers this BTW:
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperAdmin.html

View solution in original post

avatar
Mentor
The loss of the last majority-making member in any N-member quorum
would imply a complete failure of quorum, and the remaining peers will
shut off their client serving port and go into a 'waiting' mode for
members to reappear to form a majority for leader re-election again.

As a result, no clients can connect to the ZK service anymore, for
reads or for writes. Clients will simply receive a ECONNREFUSED
failure at this point. There's no 'read-only' mode - every operation
in ZK requires a presence of an quorum. The ZK overview doc mentions
this implicitly
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperOver.html:

"""
As long as a majority of the servers are available, the ZooKeeper
service will be available.
"""

View solution in original post

5 REPLIES 5

avatar
Mentor
Loss of one node in a 3-member ZooKeeper quorum is tolerable, because
2 out of 3 remaining machines still count as a majority (out of the
fully identified quorum of 3).

- No loss of functionality will be experienced with the loss of only
one of the peers (in three), two peers (in five), etc.
- One of the remaining two will automatically be assigned the leader
role, in case of a leader failure.
- Writes can continue to happen, and no special mode is invoked.
- Clients will continue to see the same behaviour they expect out of
ZooKeeper, even in such a situation.
- No manual intervention is required for this procedure, ZK is automatically HA.

The administrator's guide of ZooKeeper covers this BTW:
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperAdmin.html

avatar
Explorer

Thanks for your response and the documentation. I've read it more carefully now.

 

I have one more scenario to ask about: What does the failure of 2 zookeeper servers in a 3 server cluster imply?

 

  1. Does the final zookeeper server shutdown (not having a master to communicate with?) 
  2. Does the final zookeeper sever remain as a read only machine. 

I would appreciate if you could point me to documentation that also talked about this as well. 

avatar
Mentor
The loss of the last majority-making member in any N-member quorum
would imply a complete failure of quorum, and the remaining peers will
shut off their client serving port and go into a 'waiting' mode for
members to reappear to form a majority for leader re-election again.

As a result, no clients can connect to the ZK service anymore, for
reads or for writes. Clients will simply receive a ECONNREFUSED
failure at this point. There's no 'read-only' mode - every operation
in ZK requires a presence of an quorum. The ZK overview doc mentions
this implicitly
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperOver.html:

"""
As long as a majority of the servers are available, the ZooKeeper
service will be available.
"""

avatar
Explorer

Thanks, the link gives a 404 because there is a colon on the end, but I was still able to get there. 

avatar
New Contributor

Zookeeper works on quorum. And quorum holds the majority of servers rules. If you have 3 servers and one is down, then Majority of servers are working. you can read further Zookeeper Quorum