12-14-2017 09:55 PM - edited 12-14-2017 10:21 PM
Kudu now use raft to ensure consensus , why it still need NTP (as far as I know, the raft features don't need NTP)? What's the responsibility of NTP in KUDU? It is used to ensure the scan consistency?
Our tservers and masters always crash due to ntp unsync, and I change the max_clock_sync_error_usec to 30000000 now, will this influence the cluster?
I also saw the commit about KUDU-1578, it said:
In the case that the clock is out of sync for a significantly long time, the max error will grow large enough to eclipse the 10-second default, at which point it will still crash as before. But, if NTP is properly restored within a few minutes, the server should remain operational.
What's the meaning of max error?
01-02-2018 02:16 PM
The maximum clock error is an NTP concept, and refers to the upper bound of time that the local machine's clock may deviate from whatever NTP clock it's synchronized with. Machines with well-configured NTP installations should guarantee some sort of stable maximum clock error that you can use for Kudu's max_clock_sync_error_usec configuration flag. Unfortunately I don't understand Kudu transactions well enough to explain its effect on Kudu transaction semantics.
01-02-2018 05:49 PM - edited 01-02-2018 05:51 PM
Can I brief it as: The NTP is used to make the MVCC and READ_AT_SNAPSHOT scan accurately. If the max_clock_sync_error_usec is large will result in the scan deviation is more larger too, and vice versa?
01-03-2018 03:24 PM
Kudu's transaction system "tag's" mutations with a timestamp that has a wall-time component.
Beyond powering distributed consistency semantics (i.e. in between different tablets, each running raft internally) these timestamps can be used to run point-in-time scans, if the system is setup appropriately. As part of the algorithm, server's send the timestamps to each other all the time, either directly or through clients, and then use those to update their own clocks.
If the server's clock's are out-of-sync by a lot, then the point-in-time scans lose meaning, but moreover, other weird things might happen like a server crashing and coming back with lower wall clock time.
We're working on an alternative that will avoid users having to deal with this problem, but in the mean time I'd suggest setting ntp properly.
One possible layout that we've seen work in the past is to have a couple of ntp time masters close or within the kudu cluster and then have the servers be ntp peers to each other.