Created on 12-17-2020 02:30 AM - edited 12-17-2020 02:36 AM
we are facing two issues in this prod server DNS test fail and NTP server connection time out.
[17/Dec/2020 00:41:47 +0000] 9333 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['ntpq', '-np'] [17/Dec/2020 00:41:47 +0000] 9333 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics [17/Dec/2020 00:42:08 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [17/Dec/2020 01:31:26 +0000] 9333 Monitor-HostMonitor throttling_logger ERROR Timed out waiting for worker process collecting filesystem usage to complete. This may occur if the host has an NFS or other remote filesystem that is not responding to requests in a timely fashion. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0 [17/Dec/2020 01:31:54 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [17/Dec/2020 08:18:56 +0000] 9333 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-SERVICEMONITOR-b9bbe3508c15c97839a21fc44a6226b5 [17/Dec/2020 10:07:14 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [17/Dec/2020 11:15:43 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.8.0_251-amd64/bin/java', '-classpath', '/opt/cloudera/cm/lib/agent-6.3.0.jar', 'com.cloudera.cmon.agent.DnsTest'] [17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest. [17/Dec/2020 11:31:18 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [17/Dec/2020 12:13:44 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh. [ PROD
[17/Dec/2020 11:07:11 +0000] 9333 MainThread agent WARNING Long HB processing time: 16.7383139133
[17/Dec/2020 11:07:23 +0000] 9333 Monitor-HostMonitor filesystem_map WARNING Failed to join worker process collecting filesystem usage. All nodev filesystems will have unknown usage until the worker process is no longer active. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0
[17/Dec/2020 11:15:29 +0000] 9333 MainThread agent WARNING Supervisor failed (pid 97042). Restarting agent.
[17/Dec/2020 11:15:43 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh.
[17/Dec/2020 11:15:43 +0000] 9333 MainThread throttling_logger WARNING Failed parsing alternatives line: libnssckbi.so.x86_64 string index out of range link currently points to /usr/lib64/pkcs11/p11-kit-trust.so
[17/Dec/2020 11:15:48 +0000] 9333 MainThread agent WARNING Long HB processing time: 5.60892701149
[17/Dec/2020 11:30:53 +0000] 9333 Monitor-HostMonitor filesystem_map WARNING Failed to join worker process collecting filesystem usage. All nodev filesystems will have unknown usage until the worker process is no longer active. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0
[17/Dec/2020 11:30:54 +0000] 9333 MainThread agent WARNING Long HB processing time: 33.9636788368
[17/Dec/2020 11:30:54 +0000] 9333 MainThread agent WARNING Delayed HB: 19s since last
[17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.8.0_251-amd64/bin/java', '-classpath', '/opt/cloudera/cm/lib/agent-6.3.0.jar', 'com.cloudera.cmon.agent.DnsTest']
[17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest.
[17/Dec/2020 11:31:09 +0000] 9333 MainThread agent WARNING Supervisor failed (pid 97042). Restarting agent.
[17/Dec/2020 11:31:18 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh.
[17/Dec/2020 11:31:18 +0000] 9333 MainThread throttling_logger WARNING Failed parsing alternatives line: libnssckbi.so.x86_64 string index out of range link currently points to /usr/lib64/pkcs11/p11-kit-trust.so
[17/Dec/2020 11:31:23 +0000] 9333 MainThread agent WARNING Long HB processing time: 5.59937500954
[17/Dec/2020 12:07:16 +0000] 9333 MainThread agent WARNING Long HB processing time: 18.2336220741
[17/Dec/2020 12:07:16 +0000] 9333 MainThread agent WARNING Delayed HB: 3s since last
[17/Dec/2020 12:07:21 +0000] 9333 Monitor-HostMonitor filesystem_map WARNING Failed to join worker process collecting filesystem usage. All nodev filesystems will have unknown usage until the worker process is no longer active. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0
[17/Dec/2020 12:13:34 +0000] 9333 MainThread agent WARNING Supervisor failed (pid 97042). Restarting agent.
[17/Dec/2020 12:13:44 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh.
[17/Dec/2020 12:13:44 +0000] 9333 MainThread throttling_logger WARNING Failed parsing alternatives line: libnssckbi.so.x86_64 string index out of range link currently points to /usr/lib64/pkcs11/p11-kit-trust.so
[17/Dec/2020 12:13:50 +0000] 9333 MainThread agent WARNING Long HB processing time: 5.56325793266
[ PROD root@
[17/Dec/2020 11:30:53 +0000] 9333 Monitor-HostMonitor filesystem_map WARNING Failed to join worker process collecting filesystem usage. All nodev filesystems will have unknown usage until the worker process is no longer active. Current nodev filesystems: /dev/shm,/run,/sys/fs/cgroup,/run/cloudera-scm-agent/process,/run/cloudera-scm-agent/process,/run/user/0
[17/Dec/2020 11:30:54 +0000] 9333 MainThread agent WARNING Long HB processing time: 33.9636788368
[17/Dec/2020 11:30:54 +0000] 9333 MainThread agent WARNING Delayed HB: 19s since last
[17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.8.0_251-amd64/bin/java', '-classpath', '/opt/cloudera/cm/lib/agent-6.3.0.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/host/dns_names.py", line 87, in collect_dns_metrics
self._subprocess_with_timeout(args, self._poll_timeout)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/host/dns_names.py", line 59, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/subprocess_timeout.py", line 95, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.8.0_251-amd64/bin/java', '-classpath', '/opt/cloudera/cm/lib/agent-6.3.0.jar', 'com.cloudera.cmon.agent.DnsTest']
[17/Dec/2020 11:31:09 +0000] 9333 MainThread agent WARNING Supervisor failed (pid 97042). Restarting agent.
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO ================================================================================
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO SCM Agent Version: 6.3.0
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Agent Protocol Version: 4
[17/Dec/2020 11:31:11 +0000] 9333 MainThread __init__ INFO Agent UUID file was last modified at 2020-06-22 17:15:03.518251
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Using Host ID: 2b537ad6-388e-4e32-bea2-7584f509d4df
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Using directory: /run/cloudera-scm-agent
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Using supervisor binary path: /opt/cloudera/cm-agent/bin/../bin/supervisord
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Agent Logging Level: INFO
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Agent config:
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Security.use_tls = 0
[17/Dec/2020 11:31:11 +0000] 9333 MainThread agent INFO Security.max_cert_depth = 9
[17/Dec/2020 11:3
[17/Dec/2020 10:07:14 +0000] 9333 MainThread agent ERROR Failed to configure inotify. Parcel repository will not auto-refresh.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1007, in _init_after_first_heartbeat_response
self.inotify = self.repo.configure_inotify()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/parcel.py", line 408, in configure_inotify
wm = pyinotify.WatchManager()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/pyinotify.py", line 1783, in __init__
raise OSError(err % self._inotify_wrapper.str_errno())
OSError: Cannot initialize new instance of inotify, Errno=Too many open files (EMFILE)
[17/Dec/2020 10:07:14 +0000] 9333 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[17/Dec/2020 10:07:14 +0000] 9333 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[17/Dec/2020 10:07:14 +0000] 9333 MainThread throttling_logger WARNING Failed parsing alternatives line: libnssckbi.so.x86_64 string index out of range link currently points to /usr/lib64/pkcs11/p11-kit-trust.so
[
Created 12-22-2020 10:37 AM
@Raj77 The most potential error message is this:
OSError: Cannot initialize new instance of inotify, Errno=Too many open files (EMFILE)
This states that there are so many open file descriptor so that agent is not able to handle this. One easy way to mitigate this by hard stop/start agent using the doc.
Then comes to the second error about DnsTest.
[17/Dec/2020 11:31:05 +0000] 9333 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest.
This seems and issue with your java installation most probably you should remove offending packages (mostly openjdk) from the host and some broken (zero bytes or red) alternatives from /var/lib/alternatives and /etc/alternatives, and restart the CM agent.