<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Node managers in  stopped state in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197213#M159264</link>
    <description>&lt;P&gt;I have resolved the issue.&lt;/P&gt;&lt;P&gt;All the resources were 100% utilised because of security breach. A cron job was using yarn service for resources.&lt;/P&gt;&lt;P&gt;Resolution: I closed all public ports and ip and deleted the cron jobs from /var/spool/cron/crontabs.&lt;/P&gt;&lt;P&gt;Fortunately it was just a test cluster and the network admin had opened the ports for a while. So don't keep any ports public in your cluster.&lt;/P&gt;</description>
    <pubDate>Thu, 26 Jul 2018 13:37:26 GMT</pubDate>
    <dc:creator>andy</dc:creator>
    <dc:date>2018-07-26T13:37:26Z</dc:date>
    <item>
      <title>Node managers in  stopped state</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197212#M159263</link>
      <description>&lt;P&gt;
	All nodemanagers go into stopped state within a couple of seconds after starting up.The nodemanager status remains active after manually starting up but still remains in stopped state.All jobs remain in accepted state.&lt;/P&gt;&lt;P&gt;
	I find the following error in nodemanager logs&lt;/P&gt;&lt;PRE&gt;2018-07-23 17:23:28,988 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0069_01_000001
java.io.IOException: Timeout while waiting for exit code from container_e101_1532344242009_0069_01_000001
	at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:205)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2018-07-23 17:23:28,989 WARN  launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(106)) - Recovered container exited with a non-zero exit code 154
2018-07-23 17:23:28,991 INFO  container.ContainerImpl (ContainerImpl.java:handle(1136)) - Container container_e101_1532344242009_0069_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2018-07-23 17:23:28,991 INFO  launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(371)) - Cleaning up container container_e101_1532344242009_0069_01_000001
2018-07-23 17:23:29,006 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0071_01_000001&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Jul 2018 19:18:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197212#M159263</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2018-07-23T19:18:31Z</dc:date>
    </item>
    <item>
      <title>Re: Node managers in  stopped state</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197213#M159264</link>
      <description>&lt;P&gt;I have resolved the issue.&lt;/P&gt;&lt;P&gt;All the resources were 100% utilised because of security breach. A cron job was using yarn service for resources.&lt;/P&gt;&lt;P&gt;Resolution: I closed all public ports and ip and deleted the cron jobs from /var/spool/cron/crontabs.&lt;/P&gt;&lt;P&gt;Fortunately it was just a test cluster and the network admin had opened the ports for a while. So don't keep any ports public in your cluster.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jul 2018 13:37:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197213#M159264</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2018-07-26T13:37:26Z</dc:date>
    </item>
    <item>
      <title>Re: Node managers in  stopped state</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197214#M159265</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/66523/yarn-node-manager-not-starting.html" target="_blank"&gt;https://community.hortonworks.com/questions/66523/yarn-node-manager-not-starting.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If nodemanager.recovery.enabled is set to true, set it to false. (If turning off recovery is fine for you)&lt;/P&gt;</description>
      <pubDate>Fri, 30 Nov 2018 19:33:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Node-managers-in-stopped-state/m-p/197214#M159265</guid>
      <dc:creator>nitin_s_a_svp</dc:creator>
      <dc:date>2018-11-30T19:33:37Z</dc:date>
    </item>
  </channel>
</rss>

