<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nodemanager bad health and connection refused in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218714#M180615</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177" target="_blank"&gt;@Daniel
 Zafar
&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Your NodeManager command execution was fine however the Netstat command did not show any Port Listening on 8042 means the NodeManager was not actually started successfully.&lt;/P&gt;&lt;PRE&gt;# netstat -tnlpa | grep 8042&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Can you please check and share the NM logs.&lt;/P&gt;&lt;P&gt;Also regarding Installing NodeManager on other nodes ... it is quite easy and can be done via ambari UI as following:&lt;/P&gt;&lt;P&gt;Ambari UI --&amp;gt; Hosts (Tab) --&amp;gt; Click on the desired host link --&amp;gt; Click "Add" button (on the Components Panel)  and then choose NodeManager from the drop down &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="86436-add-nodemanager.png" style="width: 1524px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16051iE8226C0C06B18623/image-size/medium?v=v2&amp;amp;px=400" role="button" title="86436-add-nodemanager.png" alt="86436-add-nodemanager.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Similarly if you want to delete a NodeManager from a particular host then do the same:&lt;/P&gt;&lt;P&gt;Ambari UI --&amp;gt; Hosts (Tab) --&amp;gt; Click on the desired host link --&amp;gt; On the host page Click on the "NodeManager" dropdown menu. After Stopping NodeManager you will see option to "Delete" the NodeManager.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="86435-delete-nm.png" style="width: 1516px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16052iFF51FD7A73A9D336/image-size/medium?v=v2&amp;amp;px=400" role="button" title="86435-delete-nm.png" alt="86435-delete-nm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 02:47:36 GMT</pubDate>
    <dc:creator>jsensharma</dc:creator>
    <dc:date>2019-08-18T02:47:36Z</dc:date>
    <item>
      <title>Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218711#M180612</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt; maybe you can help me with this one instead?&lt;/P&gt;&lt;P&gt;I have a 4-node cluster. All four are datanodes and one node is also the resource-manager. My ambari installation only installed a node-manager on my master resource-manager node. Assuming this is correct (please let me know if it is not), I have been getting errors about my node-manager. It says the health is bad because it cannot connect:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;Connection failed to &lt;A href="http://ncienspk01.nciwin.local:8042/ws/v1/node/info" target="_blank"&gt;http://ncienspk01.nciwin.local:8042/ws/v1/node/info&lt;/A&gt; (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_nodemanager_health.py", line 171, in execute
    url_response = urllib2.urlopen(query, timeout=connection_timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
    raise URLError(err)
URLError: &amp;lt;urlopen error [Errno 111] Connection refused&amp;gt;
)&lt;/PRE&gt;&lt;P&gt;Many of my services had corrupt installs and I did a re-install. That may be the case here as well. Thoughts on how to re-install?&lt;BR /&gt;Also- should I have a node-manager on every node? If so how do I install them and connect them.&lt;/P&gt;&lt;P&gt;Thanks for your help! Dan&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 05:07:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218711#M180612</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-10T05:07:19Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218712#M180613</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Error indicates that Nodemanager is not started successfully or might be down hence the port 8042 is not accessible.&lt;/P&gt;&lt;P&gt;May be you can try starting the NodeManager manually using command line to isolate the issue (if it starts fine without ambari) Because ambari also performs the Nodemager health validation during startup.&lt;/P&gt;&lt;PRE&gt;# su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"&lt;/PRE&gt;&lt;P&gt;Then verify if the port 8042 is opened or not?&lt;/P&gt;&lt;PRE&gt;# netstat -tnlpa | grep 8042&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Also once the NodeManager is started via command line then please check the NodeManager logs and Free Memory available on the host.&lt;/P&gt;&lt;P&gt;Logs:&lt;/P&gt;&lt;PRE&gt;/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.log&amp;lt;br&amp;gt;/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-*.out&lt;/PRE&gt;&lt;P&gt;Memory:&lt;/P&gt;&lt;PRE&gt;# ps -ef | grep `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`
# $JAVA_HOME/bin/jmap -heap `cat /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid`
# free -m&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;NodeManager can be installed on all cluster nodes as well so that we have more Nodes available from ResourceManager. However for 4 node cluster i would suggest that better to install it on all 4 nodes (or at least 3 nodes).  Instsalling NodeManager on a single node might cause very slow processing of your Jobs.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 05:15:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218712#M180613</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2018-08-10T05:15:02Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218713#M180614</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;I was able to start the node-manager from the command line with no issue. &lt;/P&gt;&lt;PRE&gt;[root@NCIENSPK01 ~]#  su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.
[root@NCIENSPK01 ~]#
&lt;/PRE&gt;&lt;P&gt;port?&lt;/P&gt;&lt;PRE&gt;[root@NCIENSPK01 ~]# netstat -tnlpa | grep 8042
[root@NCIENSPK01 ~]#
&lt;/PRE&gt;&lt;P&gt;memory?&lt;/P&gt;&lt;PRE&gt;[root@NCIENSPK01 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          40072        6305       30089          51        3678       33190
Swap:          8063           0        8063&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;Can you please show me how to re-install nodemanager on this node and how to do a fresh install and any configuration for the other nodes?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 05:26:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218713#M180614</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-10T05:26:17Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218714#M180615</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177" target="_blank"&gt;@Daniel
 Zafar
&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Your NodeManager command execution was fine however the Netstat command did not show any Port Listening on 8042 means the NodeManager was not actually started successfully.&lt;/P&gt;&lt;PRE&gt;# netstat -tnlpa | grep 8042&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Can you please check and share the NM logs.&lt;/P&gt;&lt;P&gt;Also regarding Installing NodeManager on other nodes ... it is quite easy and can be done via ambari UI as following:&lt;/P&gt;&lt;P&gt;Ambari UI --&amp;gt; Hosts (Tab) --&amp;gt; Click on the desired host link --&amp;gt; Click "Add" button (on the Components Panel)  and then choose NodeManager from the drop down &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="86436-add-nodemanager.png" style="width: 1524px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16051iE8226C0C06B18623/image-size/medium?v=v2&amp;amp;px=400" role="button" title="86436-add-nodemanager.png" alt="86436-add-nodemanager.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Similarly if you want to delete a NodeManager from a particular host then do the same:&lt;/P&gt;&lt;P&gt;Ambari UI --&amp;gt; Hosts (Tab) --&amp;gt; Click on the desired host link --&amp;gt; On the host page Click on the "NodeManager" dropdown menu. After Stopping NodeManager you will see option to "Delete" the NodeManager.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="86435-delete-nm.png" style="width: 1516px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/16052iFF51FD7A73A9D336/image-size/medium?v=v2&amp;amp;px=400" role="button" title="86435-delete-nm.png" alt="86435-delete-nm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:47:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218714#M180615</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2019-08-18T02:47:36Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218715#M180616</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;I think it's pretty clear that I have an issue with my NodeManager and need to re-install it. Other things as well?
&lt;/P&gt;&lt;P&gt;Here are my logs:&lt;/P&gt;&lt;PRE&gt;2018-08-09 17:18:33,029 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED
java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:167)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:473)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
2018-08-09 17:18:33,030 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state INITED
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:473)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:167)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        ... 8 more



2018-08-09 17:18:33,031 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:473)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:167)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        ... 8 more
2018-08-09 17:18:33,032 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NodeManager metrics system...
2018-08-09 17:18:33,032 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted.
2018-08-09 17:18:33,034 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - NodeManager metrics system stopped.
2018-08-09 17:18:33,034 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(607)) - NodeManager metrics system shutdown complete.
2018-08-09 17:18:33,034 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(932)) - Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:473)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:167)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        ... 8 more
2018-08-09 17:18:33,036 INFO  nodemanager.NodeManager (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at NCIENSPK01.nciwin.local/10.96.26.90
************************************************************/
&amp;lt;br&amp;gt;
&lt;/PRE&gt;</description>
      <pubDate>Fri, 10 Aug 2018 06:40:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218715#M180616</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-10T06:40:00Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218716#M180617</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;&lt;P&gt;As you instructed I deleted nodemanager from the main node then added it to all four nodes. Now I have a node manager on each node. Unfortunately none of them work. I still get the above errors on each node. They all have the same lines:&lt;/P&gt;&lt;PRE&gt;ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(932)) - Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:473)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
...
&lt;/PRE&gt;&lt;P&gt;It seems like the install of YARN is corrupted as it is missing this core class. Is that correct? What is the solution? It's also worth mentioning that my YARN Timeline Service has never worked. I have it on maintenance mode so that I would be able to start the cluster. Maybe that is a symptom of the present issue?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 12:09:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218716#M180617</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-10T12:09:01Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218717#M180618</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/92177/danielzafar.html" nodeid="92177"&gt;@Daniel
 Zafar
&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Do you have the following kind of JAR presnet in your cluster?  The version might be slightly different in your case.&lt;/P&gt;&lt;PRE&gt;/usr/hdp/3.0.0.0-1634/spark2/aux/spark-2.3.1.3.0.0.0-1634-yarn-shuffle.jar&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Do you have the Spark2 Installed to your cluster?   &lt;/P&gt;&lt;P&gt;Please check your &lt;STRONG&gt;"yarn.&lt;/STRONG&gt;&lt;STRONG&gt;nodemanager.&lt;/STRONG&gt;&lt;STRONG&gt;aux-services"&lt;/STRONG&gt; property of YARN service and then you will find the following value .. it might be including the spark2 shuffle&lt;/P&gt;&lt;PRE&gt;mapreduce_shuffle,spark2_shuffle,{{timeline_collector}}&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 12:13:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218717#M180618</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2018-08-10T12:13:31Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218718#M180619</link>
      <description>&lt;P&gt;Spark yarn shuffle jar is missing from your server which is causing node manager failure. &lt;/P&gt;&lt;P&gt;Please check paths &lt;/P&gt;&lt;P&gt;If you have spark installed: /usr/hdp/&amp;lt;hdp-version&amp;gt;/spark/aux/&lt;/P&gt;&lt;P&gt;If you have spark2 installed /usr/hdp/&amp;lt;hdp-version&amp;gt;/spark2/aux/&lt;/P&gt;&lt;P&gt;Similar to spark-&amp;lt;sparkversion&amp;gt;.&amp;lt;hdpversion&amp;gt;-yarn-shuffle.jar&lt;/P&gt;&lt;P&gt;If this file is not present then you can copy that jar from your any other host where nodemanger is working fine &lt;/P&gt;&lt;P&gt;Just copy that jar in that path and start the nodemanger service &lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 12:49:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218718#M180619</guid>
      <dc:creator>pkadam</dc:creator>
      <dc:date>2018-08-10T12:49:49Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218719#M180620</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for troubleshooting with me. I don't have the jar you pointed at:&lt;/P&gt;&lt;PRE&gt;[root@NCIENSPK01 ~]# ls /usr/hdp/3.0.0.0-1634/spark2
aux   data      jars      NOTICE  README.md  standalone-metastore
bin   doc       LICENSE   python  RELEASE    work
conf  examples  licenses  R       sbin       yarn
[root@NCIENSPK01 ~]# ls /usr/hdp/3.0.0.0-1634/spark2/aux
[root@NCIENSPK01 ~]#&lt;/PRE&gt;&lt;P&gt;Here is that config:&lt;BR /&gt;for yarn.nodemanager.aux-services I have the following present:&lt;/P&gt;&lt;PRE&gt;mapreduce_shuffle,spark2_shuffle,{{timeline_collector}}&lt;/PRE&gt;&lt;P&gt;What is the next step? Should I re-install spark2?&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/83437/pkadam.html" nodeid="83437"&gt;@Pankaj Kadam&lt;/A&gt; I do not have any nodemanagers working in my cluster. I believe there was a corrupt installation. I have not yet run a successful job on this cluster.&lt;BR /&gt;&lt;BR /&gt;Note Timeline Service Reader V2.0 is also failing with error:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;resource_management.core.exceptions.ExecuteTimeoutException: Execution of 'ambari-sudo.sh su yarn-ats -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/texlive/2016/bin/x86_64-linux:/usr/lib64/qt-3.3/bin:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/maven/bin:/root/bin:/opt/maven/bin:/opt/maven/bin:/var/lib/ambari-agent'"'"' ; sleep 10;export HBASE_CLASSPATH_PREFIX=/usr/hdp/3.0.0.0-1634/hadoop-yarn/timelineservice/*; /usr/hdp/3.0.0.0-1634/hbase/bin/hbase --config /usr/hdp/3.0.0.0-1634/hadoop/conf/embedded-yarn-ats-hbase org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -Dhbase.client.retries.number=35 -create -s'' was killed due timeout after 300 seconds&lt;/PRE&gt;</description>
      <pubDate>Fri, 10 Aug 2018 22:13:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218719#M180620</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-10T22:13:53Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218720#M180621</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;&lt;BR /&gt;&lt;P&gt;A few updates....&lt;BR /&gt;&lt;BR /&gt;I used the commands:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;yum remove spark2_3_0_0_0_1634-yarn-shuffle
yum install spark2_3_0_0_0_1634-yarn-shuffle&lt;/PRE&gt;&lt;P&gt;to re-install spark2 yarn shuffle and like magic I found the jar:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;[root@NCIENSPK01 ~]# ls /usr/hdp/3.0.0.0-1634/spark2/aux/
spark-2.3.1.3.0.0.0-1634-yarn-shuffle.jar&lt;/PRE&gt;&lt;P&gt;BUT UNFORTUNATELY this deleted a lot of my core packages. So I had to re-install lots of core files from repo:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;yum install hadoop hadoop-hdfs hadoop-libhdfs hadoop-yarn hadoop-mapreduce hadoop-client openssl&lt;/PRE&gt;&lt;P&gt;Now I'm getting this error when I try to start resourcemanager and nodemanager&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.0.0-1634/hadoop/libexec &amp;amp;&amp;amp; /usr/hdp/3.0.0.0-1634/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.0.0-1634/hadoop/conf --daemon start nodemanager' returned 1. ERROR: Hadoop common not found.&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;Please help &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Aug 2018 00:54:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218720#M180621</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-11T00:54:24Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218721#M180622</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt;&lt;BR /&gt;When I try to start services now I'm getting:&lt;BR /&gt;&lt;BR /&gt;For &lt;STRONG&gt;HDFS Client Install&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;RuntimeError: Failed to execute command '/usr/bin/yum -y install hadoop_3_0_0_0_1634', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64'&lt;/PRE&gt;&lt;P&gt;For &lt;STRONG&gt;Hive Client Install&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;RuntimeError: Failed to execute command '/usr/bin/yum -y install hive_3_0_0_0_1634-hcatalog', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64&lt;/PRE&gt;</description>
      <pubDate>Sat, 11 Aug 2018 02:29:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218721#M180622</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-11T02:29:04Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218722#M180623</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3418/jsensharma.html" nodeid="3418"&gt;@Jay Kumar SenSharma&lt;/A&gt; I'm definitely in a jam now. Really hoping you can help me. A bit scared to touch anything at this point.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Aug 2018 04:20:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218722#M180623</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-11T04:20:50Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218723#M180624</link>
      <description>&lt;P&gt;So I resolved all this. I just followed the steps &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/ch_uninstalling_hdp_chapter.html"&gt;here&lt;/A&gt; to remove all my packages, then deleted the contents of my files:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;rm -rf /usr/hdp/&lt;/PRE&gt;Then in Ambari I used the "Start all Services" command and it went through and installed everything again for me. &lt;BR /&gt;&lt;BR /&gt;Then to solve the nodemanager issue I did the spark-yarn- install which jave me the missing jar that I needed and then just copied that dir:&lt;BR /&gt;&lt;PRE&gt;/usr/hdp/3.0.0.0-.../spark2/aux/&lt;/PRE&gt;to all the other nodes in my cluster. Now all my nodemanagers are coming up and things are looking good.&lt;BR /&gt;I'm creating another post about resolving my Timeline Service V2.0 issue which is somehow still persisting.</description>
      <pubDate>Sat, 11 Aug 2018 07:05:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218723#M180624</guid>
      <dc:creator>daniel_zafar</dc:creator>
      <dc:date>2018-08-11T07:05:23Z</dc:date>
    </item>
    <item>
      <title>Re: Nodemanager bad health and connection refused</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218724#M180625</link>
      <description>&lt;P&gt;I'm glad that all sorted now another way was deleting the particular node from the cluster and then readding it and after adding spark client on it. I have recently done that one of my test cluster recently and it worked&lt;/P&gt;</description>
      <pubDate>Sat, 11 Aug 2018 12:10:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nodemanager-bad-health-and-connection-refused/m-p/218724#M180625</guid>
      <dc:creator>pkadam</dc:creator>
      <dc:date>2018-08-11T12:10:58Z</dc:date>
    </item>
  </channel>
</rss>

