Support Questions

Find answers, ask questions, and share your expertise

Heartbeat Lost on worker machine

avatar

we add recently the worker06 to the mabari cluster

after ambari-agent restart

we see that worker machine have heartbeat loos

from the ambari-agent log we can see the following:

before the ambari-agent restart worker machine heartbeat was ok ,

so what chould be the reson for that?

ERROR 2017-11-26 08:27:09,659 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://work
er06.sys58.com:8042/ws/v1/node/info (Traceback (most recent call last):\n  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/packa
ge/alerts/alert_nodemanager_health.py", line 171, in execute\n    url_response = urllib2.urlopen(query, timeout=connection_timeout)\n  File "/usr/li
b64/python2.7/urllib2.py", line 154, in urlopen\n    return opener.open(url, data, timeout)\n  File "/usr/lib64/python2.7/urllib2.py", line 431, in
open\n    response = self._open(req, data)\n  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n    \'_open\', req)\n  File "/usr/lib64/py
thon2.7/urllib2.py", line 409, in _call_chain\n    result = func(*args)\n  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n    retu
rn self.do_open(httplib.HTTPConnection, req)\n  File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n    raise URLError(err)\nURLError: <u
rlopen error [Errno 111] Connection refused>\n)']
Michael-Bronson
1 ACCEPTED SOLUTION

avatar

the problem was solved , we see wrong configuration in host file /etc/hosts ( wrong host IP address )

and by edit the host file , we fixed also the DNS configuration , and this solved the problem

Michael-Bronson

View solution in original post

1 REPLY 1

avatar

the problem was solved , we see wrong configuration in host file /etc/hosts ( wrong host IP address )

and by edit the host file , we fixed also the DNS configuration , and this solved the problem

Michael-Bronson