Created 02-20-2018 11:35 AM
Can anyone tell me what happens in following scenario
* There are five zookeeper servers s1,s2,s3,s4 and s5
* When client connected to the s3 it was up-to-date
* client made a write request to create /test node to s3 it forwarded to leader(s5)
* As s1,s2,s5 completed that request successfully so,client got the success message
* But due to network problem s3 was not able to complete the work
* And still client is connected with s3
* Now client is making read request for reading /test node to s3
What will happen now, is s3 will throw node not found error or anything else happen?
Created 02-20-2018 12:08 PM
I don't know the details of the zookeeper implementation, but according to the documentation, it guarantuees some things:
Given those guarantees, s3 will have the correct information in the node /test, once the client received a success message. I am not sure if that means in your example that the client would not get success message.
Created 02-20-2018 04:55 PM
You probably want to look at Zab for information on how distributed consensus is achieved:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
As Harald said, the situation you have described is guaranteed to not happen as this is exactly the kind of problem that ZK was built to prevent.
Created 02-21-2018 11:59 AM
Yes, dude you are right, zookeeper has the action to prevent this scenario
But my question is what is that action?
How client will come to know that connected zookeeper is lagging behind?
Created 02-21-2018 12:25 PM
the basic idea is, that the zookeeper server will tell the client that he is lagging behind. In your scenario s3 would be a follower of s5 (the leader). So any change is requested to the leader (s5), which will only report successs to the client when all followers have acknowledged the change. But of course in this phase the network between s3 and s5 can fail, and after a timeout, s5 will drop s3 as the follower and still send a success to the client. But at this point s3 also notices that he is no longer a follower, because:
1. the network failed in both directions and s3 has also missed the heartbeat from s5, and therefore now knows that the leader is gone
2. the network connection failed just in the direction s5->s3, then s3 has still processed the update, but s5 will drop s3 as a follower as s5 doesn't receive an acknowledge. It then stops sending heartbeats and sync messages, so for s3 the same situation as in 1. occurs.
Besides this situation the most interesting part is the new leader election when the communication between the current leader and a follower is broken. It is important to ensure that not 2 leaders are elected.