Support Questions

cjervis · ‎11-15-2019

We are randomly getting "The host's NTP service is not synchronized to any remote server."

These alerts then are being fixed by themselves, and the cluster reports that "The health test result for HOST_CLOCK_OFFSET has become good"

We are using chronyd.

When I run 'chronyc sources' , I see two NTP servers listed, with ^* in front of the first one.

MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ipsap01.ecb.de 2 10 377 371 +40us[ +85us] +/- 8566us
^+ ipsap02.ecb.de 3 8 377 58 +31us[ +31us] +/- 14ms

The only thing I found in cloudera-scm-agent.log file was this:

[root cloudera-scm-agent]# catcloudera-scm-agent.log.1 | grep chronyc

[13/Nov/2019 14:35:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['chronyc', 'sources']
Exception: timeout with args ['chronyc', 'sources']
[13/Nov/2019 19:15:24 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 08:19:25 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (11 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 12:36:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (1 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 14:36:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (3 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 18:17:28 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 19:03:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (3 skipped) chronyc: chronyc sources: not synchronized to any server

What could be the problem?

paras · ‎11-17-2019

Log messages show that chrony loses synchronisation frequently.

Are you also able to reach server ipsap01.ecb.de during the time of the issue? The last received values seem to be higher as compared to the other available server.

Please compare the performance with servers configured on other hosts which do not report the issue.

Check with your network/OS team on the server availability and to make time synchronization stable on the hosts

sagarshimpi · ‎11-17-2019

Please check this once -

Try running "ntpdate ipsap01.ecb.de" on all hosts and check if any issue reported while running this command

Make sure chronyd/ntp.conf is same on all nodes

hwclock--systohc

systemctl restart cloudera-scm-agent

Further more if the above wont help then you need to debug ntp server side.

Execute below commands -

ntpq -c pe

The output shown is good, but note that if the refid column indicates ".INIT." it can suggest a communication issue.

ntpq -c as

The output below is good however if the reach column indicates "no" it suggests that the client cannot reach peer hosts.

You probably need to check stratum of your ntp servers -

The "assID" from ntpq -c as can be used with command ntpq -c "rv assID" to determine the "stratum". The lower the stratum the better. The upper limit for stratum is 15; stratum 16 is used to indicate that a device is unsynchronized.

ntpq -c "rv <association_id_from_above_command_output>"

sagarshimpi · ‎11-17-2019

Probably i see chronyd command are similar to NTP - you can refer this for debugging -

https://www.thegeekdiary.com/centos-rhel-7-tips-on-troubleshooting-ntp-chrony-issues/

GangWar · ‎11-19-2019

@MihailK In RHEL7 chronyd is taking preference over the NTP. So it's worth to check if NTP service is running, if yes then disable it. System should use chronyd only.

Secondly agent checks the status of clock in every 2 seconds and read the output of chronyd sources or ntpq. If this does not find * in the output then it marked that instant false and triggers an alert. So you also have to check that NTP servers is in sync every-time and ask with your OS team if they have any drops.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

GangWar · ‎11-28-2019

@MihailK does this resolved the issue? If yes, please spare some time to mark this as solution. Thanks.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Random "Clock offset bad" alerts