<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110517#M25853</link>
    <description>&lt;P&gt;This is a bug in Ambari. You can fix it by patching the upgrade script directly. (Posting here with my solution after suffering from this myself.) Edit /var/lib/ambari-agent/cache/common-services/YARN/your_YARN_version/package/scripts/nodemanager_upgrade.py on your NodeManager hosts:&lt;/P&gt;&lt;P&gt;At the top of the file with the other imports (line 20?), add:&lt;/P&gt;&lt;PRE&gt;import re&lt;/PRE&gt;&lt;P&gt;After line 65, add:&lt;/P&gt;&lt;PRE&gt;hostname_short = re.findall(r'(^\w+)\.', hostname)[0]&lt;/PRE&gt;&lt;P&gt;Change line 71 to the following:&lt;/P&gt;&lt;PRE&gt;if hostname in yarn_output or nodemanager_address in yarn_output or hostname_ip in yarn_output or hostname_short in yarn_output:&lt;/PRE&gt;&lt;P&gt;The upgrade will now properly check for short hostnames when you hit "Retry".&lt;/P&gt;</description>
    <pubDate>Sat, 09 Jun 2018 02:58:39 GMT</pubDate>
    <dc:creator>jeff_stafford</dc:creator>
    <dc:date>2018-06-09T02:58:39Z</dc:date>
    <item>
      <title>Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110510#M25846</link>
      <description>&lt;P&gt;We are upgrading HDP from 2.3.4 to 2.4.0. by following the
instructions in the below link: &lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_upgrading_Ambari/content/_upgrade_ambari.html"&gt;https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_upgrading_Ambari/content/_upgrade_ambari.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;All the steps in the upgrade document till “4.2 Perform
express upgrade” have been completed successfully. &lt;/P&gt;&lt;P&gt;During the express upgrade, step “Restarting NodeManager on
2 hosts”  fails in 1 host and succeeds in the other. I tried to downgrade
but downgrade too failed at the same step: &lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;On host 1: &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[yarn@node1 ~]$ yarn node -list -states=RUNNING&lt;/P&gt;&lt;P&gt;16/04/21 13:49:25 INFO impl.TimelineClientImpl: Timeline
service address: &lt;A href="http://usa0300lx259.na.xerox.net:8188/ws/v1/timeline/"&gt;http://node2.domain.net:8188/ws/v1/timeline/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;16/04/21 13:49:25 INFO client.RMProxy: Connecting to
ResourceManager at node2.domain.net/13.111.111.11:8050&lt;/P&gt;&lt;P&gt;Total Nodes:2&lt;/P&gt;&lt;P&gt; 
Node-Id 
Node-State Node-Http-Address 
Number-of-Running-Containers&lt;/P&gt;&lt;P&gt;node1:45454 
RUNNING node1:8042 
 0&lt;/P&gt;&lt;P&gt;node2:45454 
RUNNING node2:8042 
0&lt;/P&gt;&lt;P&gt;Below is the error message I see in the error log: &lt;/P&gt;&lt;P&gt;resource_management.core.exceptions.Fail: NodeManager with ID node1.domain.net:45454
was not found in the list of running NodeManagers&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;On host 2: &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;[yarn@node2 sbin]$ yarn node -list -states=RUNNING&lt;/P&gt;&lt;P&gt;16/04/21 13:49:35 INFO impl.TimelineClientImpl: Timeline
service address: &lt;A href="http://usa0300lx259.na.xerox.net:8188/ws/v1/timeline/"&gt;http://node2.domain.net:8188/ws/v1/timeline/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;16/04/21 13:49:35 INFO client.RMProxy: Connecting to
ResourceManager at node2.domain.net/13.111.111.11:8050&lt;/P&gt;&lt;P&gt;Total Nodes:2&lt;/P&gt;&lt;P&gt; 
Node-Id 
Node-State Node-Http-Address 
Number-of-Running-Containers&lt;/P&gt;&lt;P&gt;node1:45454 
RUNNING node1:8042 
0&lt;/P&gt;&lt;P&gt;node2:45454 
RUNNING node2:8042 
0&lt;/P&gt;&lt;P&gt;NO errors reported while restarting node manager in this
server. &lt;/P&gt;&lt;P&gt;&amp;lt;&amp;lt; &lt;/P&gt;&lt;P&gt;Nodemanager status looks exactly same in both nodes but I am
not sure why the restart status check fails in one node and not on the other. &lt;/P&gt;&lt;P&gt;How to fix this issue? &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/3590-node1-downgrade-log.txt"&gt;node1-downgrade-log.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/3601-node2-downgrade-log.txt"&gt;node2-downgrade-log.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 02:06:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110510#M25846</guid>
      <dc:creator>selvanand_panne</dc:creator>
      <dc:date>2016-04-22T02:06:22Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110511#M25847</link>
      <description>&lt;P&gt;Can you post the nodemanagers logs for node1 and node2 as well?  &lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 03:08:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110511#M25847</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-04-22T03:08:25Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110512#M25848</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/7977/selvanandpanneerselvam.html" nodeid="7977"&gt;@selvanand panneerselvam&lt;/A&gt;&lt;P&gt;I checked attached txt file and noticed that it is looking for NM FQDN with RPC Port 45454, see below logs&lt;/P&gt;&lt;PRE&gt;node1.domain.net:45454 was not found in the list of running NodeManagers&lt;/PRE&gt;&lt;P&gt;When you run yarn node -list -states=RUNNING command, I see the out has short hostnames without FQDN&lt;/P&gt;&lt;P&gt;Can you please check yarn.nodemanager.address?&lt;/P&gt;&lt;P&gt;Checking NM logs should give us a hint. &lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 13:20:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110512#M25848</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-22T13:20:22Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110513#M25849</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt; Thanks for looking into the issue.

yarn.nodemanager.address is set to the default 0.0.0.0 in both nodes. And `hostname` returns short hostname in both the nodes. I tried to work around the issue by hardcoding the hostname variable with short hostname in line# 66 of nodemanager_upgrade.py and the downgrade moved ahead completed fine. I tried upgrading to 2.4.0 and that too completed fine. I am not sure if this workaround has any side effects but smoke testing of the cluster post upgrade was successful.  
I am still wondering how come nodemanager@node2 was successful the first time since in node2 also the output of "yarn node -list -states=RUNNING" returned the hostnames without FQDN and the upgrade script was looking for host with FQDN.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 13:46:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110513#M25849</guid>
      <dc:creator>selvanand_panne</dc:creator>
      <dc:date>2016-04-22T13:46:20Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110514#M25850</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/7977/selvanandpanneerselvam.html" nodeid="7977"&gt;@selvanand panneerselvam&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Can you please check NM logs on both the NMs and let me know if you find something in there. &lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 13:50:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110514#M25850</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-22T13:50:12Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110515#M25851</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;@Ravi Mutyala &lt;/A&gt;&lt;/P&gt;&lt;P&gt;i am not seeing error messages relating to this issue in the nodemanager logfiles. &lt;A href="https://community.cloudera.com/legacyfs/online/attachments/3604-nodemanagerlogs.zip"&gt;nodemanagerlogs.zip&lt;/A&gt;
&lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 22 Apr 2016 14:23:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110515#M25851</guid>
      <dc:creator>selvanand_panne</dc:creator>
      <dc:date>2016-04-22T14:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110516#M25852</link>
      <description>&lt;P&gt;I faced the same issue in 2.3.2 to 2.5 upgrade where the node manager check failed on one node and went fine on other nodes and i used the same workaround. Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2016 11:55:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110516#M25852</guid>
      <dc:creator>anand_raghavan</dc:creator>
      <dc:date>2016-09-23T11:55:13Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110517#M25853</link>
      <description>&lt;P&gt;This is a bug in Ambari. You can fix it by patching the upgrade script directly. (Posting here with my solution after suffering from this myself.) Edit /var/lib/ambari-agent/cache/common-services/YARN/your_YARN_version/package/scripts/nodemanager_upgrade.py on your NodeManager hosts:&lt;/P&gt;&lt;P&gt;At the top of the file with the other imports (line 20?), add:&lt;/P&gt;&lt;PRE&gt;import re&lt;/PRE&gt;&lt;P&gt;After line 65, add:&lt;/P&gt;&lt;PRE&gt;hostname_short = re.findall(r'(^\w+)\.', hostname)[0]&lt;/PRE&gt;&lt;P&gt;Change line 71 to the following:&lt;/P&gt;&lt;PRE&gt;if hostname in yarn_output or nodemanager_address in yarn_output or hostname_ip in yarn_output or hostname_short in yarn_output:&lt;/PRE&gt;&lt;P&gt;The upgrade will now properly check for short hostnames when you hit "Retry".&lt;/P&gt;</description>
      <pubDate>Sat, 09 Jun 2018 02:58:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110517#M25853</guid>
      <dc:creator>jeff_stafford</dc:creator>
      <dc:date>2018-06-09T02:58:39Z</dc:date>
    </item>
    <item>
      <title>Re: Node manager restart fails during upgrade / downgrade between 2.3.4 and 2.4.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110518#M25854</link>
      <description>&lt;P&gt;Thanks Jeff, this worked to help me upgrade from HDP 2.6.4.0 -&amp;gt; 2.6.5.0&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 20:54:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Node-manager-restart-fails-during-upgrade-downgrade-between/m-p/110518#M25854</guid>
      <dc:creator>ammills01</dc:creator>
      <dc:date>2018-10-25T20:54:00Z</dc:date>
    </item>
  </channel>
</rss>

