<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: NodeManager fails to start in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344560#M234201</link>
    <description>&lt;P&gt;Are all the hosts healthy?&lt;/P&gt;&lt;P&gt;To check go to: CM -&amp;gt; hosts -&amp;gt; All hosts&lt;/P&gt;</description>
    <pubDate>Thu, 26 May 2022 17:45:42 GMT</pubDate>
    <dc:creator>Elias</dc:creator>
    <dc:date>2022-05-26T17:45:42Z</dc:date>
    <item>
      <title>NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344554#M234198</link>
      <description>&lt;P&gt;Hello folks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a cluster with 9 machines, running on CDH 6.2 (OnPremise). I have 3 master, 1 edge and 5 workers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not able to up 2 of 5 NodeManager on workers. 3 of them are ok, and 2 of them give me a follow log (attach), without error but a Warning with "NullPointerException":.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I put the NodeManager to run, on Cloudera Manager it doens't fail, but I got two alerts, as follow:&lt;/P&gt;&lt;P&gt;- NodeManager can not connect to ResourceManager&lt;/P&gt;&lt;P&gt;- ResourceManager could not connect to Web Server of NodeManager&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, I can't access the /jmx of the server. And, when I run NodeManager by Cloudera Manager, my CPU going to use of 100%.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On that 2 workers, I have RegionServer and DataNode working fine, the problem is only with NodeManager.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please, any suggest?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2022 16:39:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344554#M234198</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-26T16:39:23Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344560#M234201</link>
      <description>&lt;P&gt;Are all the hosts healthy?&lt;/P&gt;&lt;P&gt;To check go to: CM -&amp;gt; hosts -&amp;gt; All hosts&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2022 17:45:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344560#M234201</guid>
      <dc:creator>Elias</dc:creator>
      <dc:date>2022-05-26T17:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344563#M234202</link>
      <description>&lt;P&gt;Kind of, Elias! Most of time is healthy, but now I've seen that is a NTP clock integrity problem on 1 of 9 machines, as follow:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_0-1653588617865.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34462i62242AB3888F4D75/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_0-1653588617865.png" alt="marcosrodrigues_0-1653588617865.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The NodeManager problem affected hosts SPAPCRK03 and SPAPCRK04.&amp;nbsp;&lt;/P&gt;&lt;P&gt;SPAPCRK03 s healthy, and SPAPCRK04 has an alert of NTP clock.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I runned "ntpdc -np" and got this output:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_1-1653589216336.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34463i9540B5BD097A8FA4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_1-1653589216336.png" alt="marcosrodrigues_1-1653589216336.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But i don't know if that NTP problem is cause of NodeManager doesn't start.&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2022 18:20:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344563#M234202</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-26T18:20:36Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344564#M234203</link>
      <description>&lt;P&gt;On that 2 nodes (03 and 04), I've stopped NodeManager cause that was consuming 100% od CPU, and I wouldn't that cause more incident on cluster at all.&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2022 18:24:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344564#M234203</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-26T18:24:14Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344566#M234204</link>
      <description>&lt;P&gt;I've tried to start NodeManager with DEBUG log and got it:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_0-1653590450312.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34464i465B744367B97A2A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_0-1653590450312.png" alt="marcosrodrigues_0-1653590450312.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;That is the only WARN we have in log.&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2022 18:41:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344566#M234204</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-26T18:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344594#M234214</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/98199"&gt;@marcosrodrigues&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;the message says:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2022-05-26 13:15:58,296 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread DeletionService #0:
java.lang.NullPointerException: path cannot be null
...
        at org.apache.hadoop.fs.FileContext.delete(FileContext.java:768)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:109)
...&lt;/LI-CODE&gt;&lt;P&gt;which means that the NM on those nodes tried to delete some "empty"/null paths. It is not clear from where do these null paths come from, and I haven't found any known YARN bug releted to this.&lt;/P&gt;&lt;P&gt;Are these NodeManagers configured the same way as all the others?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Are the YARN NodeManager local disks ("NodeManager Local Directories" - "yarn.nodemanager.local-dirs") exist and readable/writable by the "yarn" user? Are those directories completely empty?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;Miklos Szurap&lt;/P&gt;&lt;P&gt;Customer Operations Engineer&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2022 09:17:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344594#M234214</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2022-05-27T09:17:23Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344605#M234219</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, all NodeManager is configured the same way. I checked it by Cloudera Manager, setting by setting.&lt;/P&gt;&lt;P&gt;Also, the NodeManager Local Directories is setted&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_0-1653661802059.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34468i4E22F4E3F5AA8C9A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_0-1653661802059.png" alt="marcosrodrigues_0-1653661802059.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And is readble/writeable by "yarn" user:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_1-1653661884543.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34469i65CC71DFE62BCED3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_1-1653661884543.png" alt="marcosrodrigues_1-1653661884543.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I checked about the same path in other nodes, and the permission was right, but the only diff is that on machines working fine, the dir "nmPrivate" was updated minutes ago, and about 03 (with issues) the last updated was May 9, at 17:31 (kind of same time we recognized the Node has shutdown). Folders aren't empty.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And about NullPointesException, its curious, cause we start it with DEBUG and TRACE logs active, and in DEBUG we found the follow, before WANR about NPE:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2022-05-26 19:51:17,963 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DeletionTask: Running DeletionTask : FileDeletionTask :  id : 2543016  user : null  subDir : null  baseDir : null
2022-05-26 19:51:17,963 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DeletionTask: NM deleting absolute path : null
2022-05-26 19:51:17,964 DEBUG org.apache.hadoop.util.concurrent.ExecutorHelper: afterExecute in thread: DeletionService #0, runnable type: java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
2022-05-26 19:51:17,964 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when running task in DeletionService #0
2022-05-26 19:51:17,965 WARN org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread DeletionService #0:
java.lang.NullPointerException: path cannot be null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.hadoop.fs.FileContext.fixRelativePart(FileContext.java:270)
        at org.apache.hadoop.fs.FileContext.delete(FileContext.java:768)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:109)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)&lt;/LI-CODE&gt;&lt;P&gt;It says the user is "null", subDir is "null" and baseDir is "null".. But i don't know where yarn goes to locate that user, subdir and baseDir.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any idea?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2022 14:43:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344605#M234219</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-27T14:43:28Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344612#M234221</link>
      <description>&lt;P&gt;I would try to clean up everything from the /appN/yarn/nm directory (at least with root user try to move out the "filecache", "nmPrivate" and "usercache" to an external directory), maybe there are some files which NM cannot clean up for some reason.&lt;/P&gt;&lt;P&gt;If that still does not help, then I can imagine that the ResourceManager statestore (in zookeeper) keeps track of some old job details and the NM tries to clean up after those old containers.&lt;/P&gt;&lt;P&gt;Is this cluster a prod cluster? If not, then you could stop all the YARN applications, then stop the YARN service and then with formatting the RM state store there should be a clean state.&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html" target="_blank"&gt;https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In CM there is an action for it under the YARN service.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2022 16:44:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344612#M234221</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2022-05-27T16:44:18Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344627#M234222</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did what u said, I copied all content from /appN/yarn/nm, including directories "filecache", "nmPrivate" and "usercache". By this way, the dir "/appN/yarn/nm" was with 0 dir and 0 files. Then I started NodeManager by ClouderaManager, and got the same error, with all services running on that machine start successfully, except NodeManager..Also, the directory "/appN/yarn/nm" stays with no content, even after tried to put it run by CM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I realized that when I run "yarn nodemanager" with root user, the nodemanager run with no error, and with some parameters different when comparting with CM start command, but ClouderaManager doesn't recognize the Node, and I got that when I run it by CM, the command has some parameters (that is the same when comparting with NodeManager that is ok). Maybe it can be something with Yarn user?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;About the RM StateStore, I didn't find any information about that. At link u sent, that say to run "&lt;SPAN&gt;-format-state-store" only if ResourceManager is not running, and in cluster, it is running okay, and recognzing 3 of 5 nodes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="marcosrodrigues_0-1653682839935.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34470i5698E149FD16CD76/image-size/medium?v=v2&amp;amp;px=400" role="button" title="marcosrodrigues_0-1653682839935.png" alt="marcosrodrigues_0-1653682839935.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Unfortunaltey, it is a production cluster, so I think I can't stop Yarn whole at all.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do u have any sugestion?&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2022 20:21:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344627#M234222</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-05-27T20:21:49Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344660#M234234</link>
      <description>&lt;P&gt;Be careful with starting processes as root user, as that may leave some files and directories around owned as root - and then the ordinary "yarn" user (the process stareted by CM) won't be able to write it. For example log files under /var/log/hadoop-yarn/... Please verify that.&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2022 18:02:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344660#M234234</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2022-05-30T18:02:49Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344920#M234304</link>
      <description>&lt;P&gt;Hi guys!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Finally we solved the problem. To fix it, we moved all content from&amp;nbsp; "&lt;SPAN&gt;yarn.nodemanager.recovery.dir" config path to another one (i.e mv yarn-rm-recovery yarn-rm-recovery-backup) and we created yarn-rm-recovery again, grant permisison to yarn:hadoop to folder.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;After that, we can start NodeManager with no error.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks all!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jun 2022 15:47:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344920#M234304</guid>
      <dc:creator>marcosrodrigues</dc:creator>
      <dc:date>2022-06-02T15:47:10Z</dc:date>
    </item>
    <item>
      <title>Re: NodeManager fails to start</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344958#M234316</link>
      <description>&lt;P&gt;That is great, thank you for sharing the solution!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;Miklos&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jun 2022 07:48:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NodeManager-fails-to-start/m-p/344958#M234316</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2022-06-03T07:48:54Z</dc:date>
    </item>
  </channel>
</rss>

