Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Nodemanager connection refused and bad health

Highlighted

Nodemanager connection refused and bad health

Expert Contributor

Hi Folks


Hope all are doing good.!

We are using HDP 2.6.5 and we are using 20 nodes of cluster. Everyday we are getting NodeManager health issue and connection refused and sometimes Nodemanager restart itself. i got logs from nodemanager log file:

SHUTDONN LOGS:
2019-04-02 22:28:20,947 INFO  monitor.ContainersMonitorImpl - Memory usage of ProcessTree 6948 for container-id container_e17_1553506205851_46103_01_000054
: 520.5 MB of 2 GB physical memory used; 3.6 GB of 4.2 GB virtual memory used
2019-04-02 22:28:20,948 WARN  monitor.ContainersMonitorImpl - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is i
nterrupted. Exiting.
2019-04-02 22:28:21,162 INFO  launcher.ContainerLaunch - Container container_e17_1553506205851_46103_01_000055 succeeded
2019-04-02 22:28:21,660 INFO  ipc.Server - Stopping server on 8040
2019-04-02 22:28:21,661 INFO  ipc.Server - Stopping IPC Server Responder
2019-04-02 22:28:21,662 INFO  localizer.ResourceLocalizationService - Public cache exiting
2019-04-02 22:28:21,663 INFO  ipc.Server - Stopping IPC Server listener on 8040
2019-04-02 22:28:21,683 INFO  nodemanager.NodeManager - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at wn
NodeManager connection refused and bad health:
2019-04-05 06:00:02,972 INFO  nodemanager.NodeStatusUpdaterImpl - Sending out 61 NM container statuses: [[container_e23_1554290874215_9521_01_000002, Creat
eTime: 1554442308478, State: RUNNING, Capability: <memory:4096, vCores:1>, Diagnostics: , ExitStatus: -1000, Priority: 0], [container_e23_1554290874215_960
7_01_000075, CreateTime: 1554443718725, State: COMPLETE, Capability: <memory:2048, vCores:1>, Diagnostics: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
, ExitStatus: 143, Priority: 20], [container_e23_1554290874215_9607_01_000076, CreateTime: 1554443718726, State: COMPLETE, Capability: <memory:2048, vCores
:1>, Diagnostics: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.

Without giving any error, AM killing the containers.


Could someone help me to sort out this issue?



Regards,

VInay K

1 REPLY 1

Re: Nodemanager connection refused and bad health

New Contributor

The shutdown will work when you add the tag of this attribute in the code so that they reader will read the tag and give the position to that based on the setting.buttetowing