Support Questions

Find answers, ask questions, and share your expertise

RLError: <urlopen error [Errno -3] Temporary failure in name resolution>

avatar
Explorer

Host health is concerning at regular intervals. IT gets amber and then green.

 

  • The hostname and canonical name for this host are consistent when checked from a Java process. The most recent DNS resolution took 5 second(s). Warning threshold: 1 second(s).

I checked /etc/resolv.conf and there we have two name servers listed.

Checking the cloudera agent logs on one of them and i see these errors in the logs. (replaced the hostname with xxx)

 

 

Monitor-GenericMonitor throttling_logger ERROR    Error fetching metrics at 'http://xxx:8088/jmx'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 200, in _collect_and_parse_and_return
    self._adapter.safety_valve))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 166, in urlopen_with_retry_on_authentication_errors
    return function()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/monitor/generic/metric_collectors.py", line 217, in _open_url
    password=self._password_value)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.7.1-py2.6.egg/cmf/url_util.py", line 66, in urlopen_with_timeout
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

 

 

The same time when this test becomes amber then we get these alerts as well.

 

The health test result for RESOURCE_MANAGER_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server.

1 REPLY 1

avatar
Mentor
The error "Temporary failure in name resolution" comes out of the DNS lookup sub-system on your OS, and likely indicates a fault of some sort when accessing one or more of your nameservers (defined in /etc/resolv.conf).

If this is a repeating yet intermittent problem, I'd recommend contacting the DNS maintainers to find out if there are maintenance events or other downtime related issues ongoing with their servers. You can also check your /var/log/messages or "dmesg" contents for more clues about this lower-env trouble.

The RM and other alerts you see coming out as a result of this failure is an avalanche effect. The agent polls metrics and states from the roles it runs, by contacting their webserver end-points. Since that's failing to resolve (its really a local address, shouldn't have to go through DNS if your /etc/nsswitch.conf is setup right) the alert gets flagged too.

Its worth also running a local nameservice caching daemon (Such as nscd, etc.) to help cushion such effects to a certain degree and also to prevent overloading the DNS with too many queries which could also cause this potentially.