Member since
08-12-2016
7
Posts
0
Kudos Received
0
Solutions
08-12-2016
09:03 AM
Host health is concerning at regular intervals. IT gets amber and then green.
DNS Resolution.
The hostname and canonical name for this host are consistent when checked from a Java process. The most recent DNS resolution took 5 second(s). Warning threshold: 1 second(s).
I checked /etc/resolv.conf and there we have two name servers listed.
Checking the cloudera agent logs on one of them and i see these errors in the logs. (replaced the hostname with xxx)
Monitor-GenericMonitor throttling_logger ERROR Error fetching metrics at 'http://xxx:8088/jmx' Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 200, in _collect_and_parse_and_return self._adapter.safety_valve)) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 166, in urlopen_with_retry_on_authentication_errors return function() File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 217, in _open_url password=self._password_value) File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout return opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib64/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open raise URLError(err) URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
The same time when this test becomes amber then we get these alerts as well.
The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Cloudera Manager