- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Behavior of 3 node Zookeeper quorum when 1 node fails
- Labels:
-
Apache HBase
-
Apache Zookeeper
Created on ‎08-07-2014 01:36 PM - edited ‎09-16-2022 02:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I have noticed that operations can still continue when zookeeper fails 1 node of it's 3 node quorum although I understand that it can no longer accept writes because it has no leader (is this right). I know we have to satisfy the ceil(N/2) requirement, but this is to chose leadership it says nothing about how the dependencies (HBase in particular) would be affected.
My questions is at it's core, what is the behavior of zookeeper when failure is seen from 3 nodes to 2. Does it turn into a read only coordinator, as an immediate fix can one be assigned master?
Thanks.
Created ‎08-07-2014 10:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 out of 3 remaining machines still count as a majority (out of the
fully identified quorum of 3).
- No loss of functionality will be experienced with the loss of only
one of the peers (in three), two peers (in five), etc.
- One of the remaining two will automatically be assigned the leader
role, in case of a leader failure.
- Writes can continue to happen, and no special mode is invoked.
- Clients will continue to see the same behaviour they expect out of
ZooKeeper, even in such a situation.
- No manual intervention is required for this procedure, ZK is automatically HA.
The administrator's guide of ZooKeeper covers this BTW:
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperAdmin.html
Created ‎08-08-2014 09:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
would imply a complete failure of quorum, and the remaining peers will
shut off their client serving port and go into a 'waiting' mode for
members to reappear to form a majority for leader re-election again.
As a result, no clients can connect to the ZK service anymore, for
reads or for writes. Clients will simply receive a ECONNREFUSED
failure at this point. There's no 'read-only' mode - every operation
in ZK requires a presence of an quorum. The ZK overview doc mentions
this implicitly
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperOver.html:
"""
As long as a majority of the servers are available, the ZooKeeper
service will be available.
"""
Created ‎08-07-2014 10:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 out of 3 remaining machines still count as a majority (out of the
fully identified quorum of 3).
- No loss of functionality will be experienced with the loss of only
one of the peers (in three), two peers (in five), etc.
- One of the remaining two will automatically be assigned the leader
role, in case of a leader failure.
- Writes can continue to happen, and no special mode is invoked.
- Clients will continue to see the same behaviour they expect out of
ZooKeeper, even in such a situation.
- No manual intervention is required for this procedure, ZK is automatically HA.
The administrator's guide of ZooKeeper covers this BTW:
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperAdmin.html
Created ‎08-08-2014 08:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response and the documentation. I've read it more carefully now.
I have one more scenario to ask about: What does the failure of 2 zookeeper servers in a 3 server cluster imply?
- Does the final zookeeper server shutdown (not having a master to communicate with?)
- Does the final zookeeper sever remain as a read only machine.
I would appreciate if you could point me to documentation that also talked about this as well.
Created ‎08-08-2014 09:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
would imply a complete failure of quorum, and the remaining peers will
shut off their client serving port and go into a 'waiting' mode for
members to reappear to form a majority for leader re-election again.
As a result, no clients can connect to the ZK service anymore, for
reads or for writes. Clients will simply receive a ECONNREFUSED
failure at this point. There's no 'read-only' mode - every operation
in ZK requires a presence of an quorum. The ZK overview doc mentions
this implicitly
http://archive.cloudera.com/cdh5/cdh/5/zookeeper/zookeeperOver.html:
"""
As long as a majority of the servers are available, the ZooKeeper
service will be available.
"""
Created ‎08-08-2014 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, the link gives a 404 because there is a colon on the end, but I was still able to get there.
Created ‎08-04-2019 09:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zookeeper works on quorum. And quorum holds the majority of servers rules. If you have 3 servers and one is down, then Majority of servers are working. you can read further Zookeeper Quorum
