06-15-2017 05:30 PM
06-16-2017 01:21 AM
06-19-2017 07:45 AM
I was able to resolve it .
if you see error with ntptime mostly kudu service will go down , so you have to restart ntpd and then this error will go .
[root@wcw0hd3dn02 ~]# ntptime
ntp_gettime() returns code 5 (ERROR)
time dce7466c.fc37b000 Sun, Jun 11 2017 0:32:44.985, (.985225),
maximum error 16000000 us, estimated error 16 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 16000000 us, estimated error 16 us,
status 0x4041 (PLL,UNSYNC,MODE),
time constant 7, precision 1.000 us, tolerance 500 ppm,
this error comes if you run ntp with -x option
[root@wuwcw0hd3mn01 ~]# ps -ef|grep ntp
root 3183 2731 0 10:38 pts/0 00:00:00 grep ntp
ntp 20736 1 0 Jun13 ? 00:00:19 ntpd -x -u ntp:ntp -p /var/run/ntpd.pid -g
remove -X from belwo file and restart ntp
[root@wuwcw0hd3mn01 ~]# more /etc/sysconfig/ntpd
# Drop root to id 'ntp:ntp' by default.
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g"
wait for ntp to synchronize ,after that i didnt see any issue in kudu service so far.
ntp_gettime() returns code 0 (OK)
time dcf260c3.66c6abfc Mon, Jun 19 2017 10:40:03.401, (.401469911),
maximum error 394157 us, estimated error 345 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset -707.277 us, frequency 20.094 ppm, interval 1 s,
maximum error 394157 us, estimated error 345 us,
status 0x6001 (PLL,NANO,MODE),
time constant 10, precision 0.001 us, tolerance 500 ppm,
check this https://access.redhat.com/solutions/38542
06-19-2017 11:04 PM
Thank you @MSharma for updating us on your findings!
That is very interesting. It sounds like you enabled NTP stepping (which -x disables) whereas before it could only use slewing. Apparently stepping has kept you from falling too far out of sync from the time source for Kudu to tolerate.
I just checked and on one of the more long-lived and stable test environments I periodically use (note: it is NOT a production system) where I have run many different versions of Kudu over the years I do not have -x set in OPTIONS. On that machine (running CentOS 6.6) there is only the following in /etc/sysconfig/ntpd:
# Drop root to id 'ntp:ntp' by default. OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"
It may be worth noting that this machine also has the following set in /etc/ntp.conf:
# tinker panic 0 instructs NTP not to give up if it sees a large jump in time. # This is important for coping with large time drifts and also resuming virtual # machines from their suspended state.
tinker panic 0 # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default kod nomodify notrap nopeer noquery restrict -6 default kod nomodify notrap nopeer noquery # Drift file. Put this in a directory which the daemon can write to. # No symbolic links allowed, either, since the daemon updates the file # by creating a temporary in the same directory and then rename()'ing # it to the file. driftfile /var/lib/ntp/drift
I can't tell you whether or not this is an ideal NTP configuration, or whether it is fully correct, but it seems stable.
For those who want more insight into what -x and slewing means, I'd recommend looking at the ntpd(8) man page and doing a Find for the keyword "slew": https://linux.die.net/man/8/ntpd
For those having problems with NTP stability in general, also consider reading through the "NTP Debugging Techniques" section of the Official NTP Documentation: http://doc.ntp.org/4.2.6p5/debug.html
06-29-2017 08:27 AM
06-30-2017 08:34 AM
06-30-2017 08:39 AM
Any chance some of you are running on Azure? It has known issues with ntp: https://social.msdn.microsoft.com/Forums/azure/en-US/8c0a1026-0b02-405a-848e-628e68229eaf/i-have-a-l...