Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Random "Clock offset bad" alerts

avatar
New Contributor

We are randomly getting "The host's NTP service is not synchronized to any remote server."

 

These alerts  then are being fixed by themselves, and the cluster reports that "The health test result for HOST_CLOCK_OFFSET has become good"

 

We are using chronyd.

When I run 'chronyc sources' , I see two NTP servers listed, with ^* in front of the first one.

 

MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ipsap01.ecb.de 2 10 377 371 +40us[ +85us] +/- 8566us
^+ ipsap02.ecb.de 3 8 377 58 +31us[ +31us] +/- 14ms

 

The only thing I found in  cloudera-scm-agent.log file was this:

[root cloudera-scm-agent]# catcloudera-scm-agent.log.1 | grep chronyc

[13/Nov/2019 14:35:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['chronyc', 'sources']
Exception: timeout with args ['chronyc', 'sources']
[13/Nov/2019 19:15:24 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 08:19:25 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (11 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 12:36:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (1 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 14:36:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (3 skipped) chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 18:17:28 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR chronyc: chronyc sources: not synchronized to any server
[14/Nov/2019 19:03:26 +0000] 58865 Monitor-HostMonitor throttling_logger ERROR (3 skipped) chronyc: chronyc sources: not synchronized to any server

 

What could be the problem?

 

 

 

 

5 REPLIES 5

avatar
Master Collaborator

Log messages show that chrony loses synchronisation frequently.

 

Are you also able to reach server ipsap01.ecb.de during the time of the issue? The last received values seem to be higher as compared to the other available server.

Please compare the performance with servers configured on other hosts which do not report the issue.

Check with your network/OS team on the server availability and to make time synchronization stable on the hosts

 

 

avatar
Expert Contributor

Please check this once -

 

Try running "ntpdate ipsap01.ecb.de" on all hosts and check if any issue reported while running this command

Make sure chronyd/ntp.conf is same on all nodes

hwclock--systohc

systemctl restart cloudera-scm-agent

Further more if the above wont help then you need to debug ntp server side.

Execute below commands -

 

ntpq -c pe

The output shown is good, but note that if the refid column indicates ".INIT." it can suggest a communication issue.

 

ntpq -c as

The output below is good however if the reach column indicates "no" it suggests that the client cannot reach peer hosts.

 

You probably need to check stratum of your ntp servers -

 

The "assID" from ntpq -c as can be used with command ntpq -c "rv assID" to determine the "stratum". The lower the stratum the better. The upper limit for stratum is 15; stratum 16 is used to indicate that a device is unsynchronized.

 

ntpq -c "rv   <association_id_from_above_command_output>"

avatar
Expert Contributor

Probably i see chronyd command are similar to NTP - you can refer this for debugging -

 

https://www.thegeekdiary.com/centos-rhel-7-tips-on-troubleshooting-ntp-chrony-issues/

 

 

avatar
Master Guru

@MihailK  In RHEL7 chronyd is taking preference over the NTP. So it's worth to check if NTP service is running, if yes then disable it. System should use chronyd only.

Secondly agent checks the status of clock in every 2 seconds and read the output of chronyd sources or ntpq. If this does not find * in the output then it marked that instant false and triggers an alert. So you also have to check that NTP servers is in sync every-time and ask with your OS team if they have any drops. 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Master Guru

@MihailK does this resolved the issue? If yes, please spare some time to mark this as solution. Thanks.


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.