Created 08-09-2018 07:18 AM
Error log on Ambari for Yarn. Connection failed to http://ip-172-31-18-234.ec2.internal:8042/ws/v1/node/info (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute url_response = urllib2.urlopen(query, timeout=connection_timeout) File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib64/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open raise URLError(err) URLError: <urlopen error [Errno 111] Connection refused>
Connection failed to http://ip-172-31-18-234.ec2.internal:8042 (<urlopen error [Errno 111] Connection refused >)
Sudo top on Slave
882 yarn 20 0 397272 31392 456 S 1557 0.0 566:11.83 java 1 root 20 0 194556 7676 4160 S 0.0 0.0 1:31.20 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.21 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:07.80 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root rt 0 0 0 0 S 0.0 0.0 0:00.50 migration/0 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root 20 0 0 0 0 S 0.0 0.0 0:52.80 rcu_sched 10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain 11 root rt 0 0 0 0 S 0.0 0.0 0:00.98 watchdog/0 12 root rt 0 0 0 0 S 0.0 0.0 0:00.82 watchdog/1 13 root rt 0 0 0 0 S 0.0 0.0 0:00.53 migration/1 14 root 20 0 0 0 0 S 0.0 0.0 0:08.16 ksoftirqd/1 16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 17 root rt 0 0 0 0 S 0.0 0.0 0:00.82 watchdog/2 18 root rt 0 0 0 0 S 0.0 0.0 0:00.50 migration/2
<br>
Created 08-09-2018 07:23 AM
Ambari might be starting the NodeManager process fine. However due to lake of resources / Some errors the NodeManager might be going down in some time ... hence ambari is showing the alert regarding the NodeManager process as it is not able to reach to the port 8042 on the mentioned host beczause the NodeManager might be down by the time.
So better please check if you have enough resources in the mentioned NodeManager host or not?
# free -m # top
.
Also please check the NodeManager related logs to see if there are any errors mentioned in it's log? Please check and share the following log.
/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.log /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.out
.
Please try starting the nodemanager using command line and then on another terminal put the NodeManager log in tail mode to see if it fails? (this is to isolate the issue with NM startup via ambari or Manual Nodemanager startup also fails)
# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
.
Created 08-09-2018 07:26 AM
Also as soon as you start the NodeManager keep an eye on it's Memory usage to findout if it is having some memory issues? Or if it needs any tuning
# ps -ef | grep `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid` # $JAVA_HOME/bin/jmap -heap `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`
# free -m
.