Host health is concerning at regular intervals. IT gets amber and then green.
I checked /etc/resolv.conf and there we have two name servers listed.
Checking the cloudera agent logs on one of them and i see these errors in the logs. (replaced the hostname with xxx)
Monitor-GenericMonitor throttling_logger ERROR Error fetching metrics at 'http://xxx:8088/jmx'
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 200, in _collect_and_parse_and_return
self._adapter.safety_valve))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 166, in urlopen_with_retry_on_authentication_errors
return function()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 217, in _open_url
password=self._password_value)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
raise URLError(err)
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
The same time when this test becomes amber then we get these alerts as well.
The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.