Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[Kudu][NTP] Kudu use raft to ensure consensus, what's the responsibility of NTP in KUDU?

avatar
Explorer

Hi,

 

Kudu now use raft to ensure consensus ,  why it still need NTP (as far as I know, the raft features don't need NTP)? What's the responsibility of NTP in KUDU? It is used to ensure the scan consistency?

 

Our tservers and masters always crash due to ntp unsync, and I change the max_clock_sync_error_usec to 30000000 now, will this influence the cluster?

 

I also saw the commit about KUDU-1578, it said:

In the case that the clock is out of sync for a significantly long time,
the max error will grow large enough to eclipse the 10-second default,
at which point it will still crash as before. But, if NTP is properly
restored within a few minutes, the server should remain operational.

What's the meaning of max error?

 

Best regards,

Tony

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Kudu's transaction system "tag's" mutations with a timestamp that has a wall-time component.

 

Beyond powering distributed consistency semantics (i.e. in between different tablets, each running raft internally) these timestamps can be used to run point-in-time scans, if the system is setup appropriately. As part of the algorithm, server's send the timestamps to each other all the time, either directly or through clients, and then use those to update their own clocks.

 

If the server's clock's are out-of-sync by a lot, then the point-in-time scans lose meaning, but moreover, other weird things might happen like a server crashing and coming back with lower wall clock time.

 

We're working on an alternative that will avoid users having to deal with this problem, but in the mean time I'd suggest setting ntp properly.

 

One possible layout that we've seen work in the past is to have a couple of ntp time masters close or within the kudu cluster and then have the servers be ntp peers to each other.

 

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

NTP synchronization (and specifically enforcing a maximum clock error on each node) helps guarantee Kudu's transaction semantics. This page has more details, as does the Kudu design paper.

 

The maximum clock error is an NTP concept, and refers to the upper bound of time that the local machine's clock may deviate from whatever NTP clock it's synchronized with. Machines with well-configured NTP installations should guarantee some sort of stable maximum clock error that you can use for Kudu's max_clock_sync_error_usec configuration flag. Unfortunately I don't understand Kudu transactions well enough to explain its effect on Kudu transaction semantics.

avatar
Explorer

Can I brief it as: The NTP is used to make the MVCC and READ_AT_SNAPSHOT scan accurately. If the max_clock_sync_error_usec is large will result in the scan deviation is more larger too, and vice versa?

avatar
New Contributor

Kudu's transaction system "tag's" mutations with a timestamp that has a wall-time component.

 

Beyond powering distributed consistency semantics (i.e. in between different tablets, each running raft internally) these timestamps can be used to run point-in-time scans, if the system is setup appropriately. As part of the algorithm, server's send the timestamps to each other all the time, either directly or through clients, and then use those to update their own clocks.

 

If the server's clock's are out-of-sync by a lot, then the point-in-time scans lose meaning, but moreover, other weird things might happen like a server crashing and coming back with lower wall clock time.

 

We're working on an alternative that will avoid users having to deal with this problem, but in the mean time I'd suggest setting ntp properly.

 

One possible layout that we've seen work in the past is to have a couple of ntp time masters close or within the kudu cluster and then have the servers be ntp peers to each other.

 

avatar
Explorer
@dalves, Thanks for your quick reply.