Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I have a 3 node Hortonworks Cluster on AWS. When i Start the services from Ambari it works well initially but after sometime the node managers goes down and when i see on the machines a yarn process uses 100% CPU. Thanks in advance!

I have a 3 node Hortonworks Cluster on AWS. When i Start the services from Ambari it works well initially but after sometime the node managers goes down and when i see on the machines a yarn process uses 100% CPU. Thanks in advance!

New Contributor
Error log on Ambari for Yarn.


Connection failed to http://ip-172-31-18-234.ec2.internal:8042/ws/v1/node/info (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanager_health.py", line 171, in execute
    url_response = urllib2.urlopen(query, timeout=connection_timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused> 
Connection failed to http://ip-172-31-18-234.ec2.internal:8042 (<urlopen error [Errno 111] Connection refused

>)

Sudo top on Slave

882 yarn 20 0 397272 31392 456 S 1557 0.0 566:11.83 java 1 root 20 0 194556 7676 4160 S 0.0 0.0 1:31.20 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.21 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:07.80 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root rt 0 0 0 0 S 0.0 0.0 0:00.50 migration/0 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root 20 0 0 0 0 S 0.0 0.0 0:52.80 rcu_sched 10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain 11 root rt 0 0 0 0 S 0.0 0.0 0:00.98 watchdog/0 12 root rt 0 0 0 0 S 0.0 0.0 0:00.82 watchdog/1 13 root rt 0 0 0 0 S 0.0 0.0 0:00.53 migration/1 14 root 20 0 0 0 0 S 0.0 0.0 0:08.16 ksoftirqd/1 16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 17 root rt 0 0 0 0 S 0.0 0.0 0:00.82 watchdog/2 18 root rt 0 0 0 0 S 0.0 0.0 0:00.50 migration/2

<br>

2 REPLIES 2

Re: I have a 3 node Hortonworks Cluster on AWS. When i Start the services from Ambari it works well initially but after sometime the node managers goes down and when i see on the machines a yarn process uses 100% CPU. Thanks in advance!

Super Mentor

@Ankit Dwivedi

Ambari might be starting the NodeManager process fine. However due to lake of resources / Some errors the NodeManager might be going down in some time ... hence ambari is showing the alert regarding the NodeManager process as it is not able to reach to the port 8042 on the mentioned host beczause the NodeManager might be down by the time.

So better please check if you have enough resources in the mentioned NodeManager host or not?

# free -m
# top

.

Also please check the NodeManager related logs to see if there are any errors mentioned in it's log? Please check and share the following log.

/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.log
/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.out

.


Please try starting the nodemanager using command line and then on another terminal put the NodeManager log in tail mode to see if it fails? (this is to isolate the issue with NM startup via ambari or Manual Nodemanager startup also fails)

# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_reference/content/starting_hdp_services....

.

Re: I have a 3 node Hortonworks Cluster on AWS. When i Start the services from Ambari it works well initially but after sometime the node managers goes down and when i see on the machines a yarn process uses 100% CPU. Thanks in advance!

Super Mentor

@Ankit Dwivedi

Also as soon as you start the NodeManager keep an eye on it's Memory usage to findout if it is having some memory issues? Or if it needs any tuning

# ps -ef | grep `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`
# $JAVA_HOME/bin/jmap -heap `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`
# free -m

.