we add recently the worker06 to the mabari cluster
after ambari-agent restart
we see that worker machine have heartbeat loos
from the ambari-agent log we can see the following:
before the ambari-agent restart worker machine heartbeat was ok ,
so what chould be the reson for that?
ERROR 2017-11-26 08:27:09,659 script_alert.py:123 - [Alert][yarn_nodemanager_health] Failed with result CRITICAL: ['Connection failed to http://work
er06.sys58.com:8042/ws/v1/node/info (Traceback (most recent call last):\n File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/packa
ge/alerts/alert_nodemanager_health.py", line 171, in execute\n url_response = urllib2.urlopen(query, timeout=connection_timeout)\n File "/usr/li
b64/python2.7/urllib2.py", line 154, in urlopen\n return opener.open(url, data, timeout)\n File "/usr/lib64/python2.7/urllib2.py", line 431, in
open\n response = self._open(req, data)\n File "/usr/lib64/python2.7/urllib2.py", line 449, in _open\n \'_open\', req)\n File "/usr/lib64/py
thon2.7/urllib2.py", line 409, in _call_chain\n result = func(*args)\n File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open\n retu
rn self.do_open(httplib.HTTPConnection, req)\n File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open\n raise URLError(err)\nURLError: <u
rlopen error [Errno 111] Connection refused>\n)']
Michael-Bronson