Member since
04-21-2016
3
Posts
2
Kudos Received
0
Solutions
04-22-2016
07:23 AM
@Kuldeep Kulkarni @Ravi Mutyala i am not seeing error messages relating to this issue in the nodemanager logfiles. nodemanagerlogs.zip
... View more
04-22-2016
06:46 AM
@Kuldeep Kulkarni Thanks for looking into the issue.
yarn.nodemanager.address is set to the default 0.0.0.0 in both nodes. And `hostname` returns short hostname in both the nodes. I tried to work around the issue by hardcoding the hostname variable with short hostname in line# 66 of nodemanager_upgrade.py and the downgrade moved ahead completed fine. I tried upgrading to 2.4.0 and that too completed fine. I am not sure if this workaround has any side effects but smoke testing of the cluster post upgrade was successful.
I am still wondering how come nodemanager@node2 was successful the first time since in node2 also the output of "yarn node -list -states=RUNNING" returned the hostnames without FQDN and the upgrade script was looking for host with FQDN.
... View more
04-21-2016
07:06 PM
2 Kudos
We are upgrading HDP from 2.3.4 to 2.4.0. by following the
instructions in the below link: https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_upgrading_Ambari/content/_upgrade_ambari.html All the steps in the upgrade document till “4.2 Perform
express upgrade” have been completed successfully. During the express upgrade, step “Restarting NodeManager on
2 hosts” fails in 1 host and succeeds in the other. I tried to downgrade
but downgrade too failed at the same step: >> On host 1: [yarn@node1 ~]$ yarn node -list -states=RUNNING 16/04/21 13:49:25 INFO impl.TimelineClientImpl: Timeline
service address: http://node2.domain.net:8188/ws/v1/timeline/ 16/04/21 13:49:25 INFO client.RMProxy: Connecting to
ResourceManager at node2.domain.net/13.111.111.11:8050 Total Nodes:2
Node-Id
Node-State Node-Http-Address
Number-of-Running-Containers node1:45454
RUNNING node1:8042
0 node2:45454
RUNNING node2:8042
0 Below is the error message I see in the error log: resource_management.core.exceptions.Fail: NodeManager with ID node1.domain.net:45454
was not found in the list of running NodeManagers On host 2: [yarn@node2 sbin]$ yarn node -list -states=RUNNING 16/04/21 13:49:35 INFO impl.TimelineClientImpl: Timeline
service address: http://node2.domain.net:8188/ws/v1/timeline/ 16/04/21 13:49:35 INFO client.RMProxy: Connecting to
ResourceManager at node2.domain.net/13.111.111.11:8050 Total Nodes:2
Node-Id
Node-State Node-Http-Address
Number-of-Running-Containers node1:45454
RUNNING node1:8042
0 node2:45454
RUNNING node2:8042
0 NO errors reported while restarting node manager in this
server. << Nodemanager status looks exactly same in both nodes but I am
not sure why the restart status check fails in one node and not on the other. How to fix this issue? node1-downgrade-log.txtnode2-downgrade-log.txt
... View more
Labels:
- Labels:
-
Apache Ambari