Reply
Explorer
Posts: 24
Registered: ‎06-24-2018

Yarn- Node Manager unexpected exists occurring after different periods

Hello,

 

Ok, so i had Node manager running completly fine and suprisingly it started to crash and exited every few minutes. For instance its exited at x time and minutes, after 10-15 minutes it will be back again.

 

I looked up to host logs and Node manager logs specifically, i found following message related to "stop instruction by container for application xxxx"

 

2018-08-06 23:10:09,842 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 3048 for container-id container_1533576341741_0986_01_000001: -1B of 1 GB physical memory used; -1B of 2.1 GB virtual memory used
2018-08-06 23:10:10,178 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_1533576341741_0986_01_000001
java.io.IOException: Timeout while waiting for exit code from container_1533576341741_0986_01_000001
at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:199)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2018-08-06 23:10:10,186 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Recovered container exited with a non-zero exit code 154
2018-08-06 23:10:10,191 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1533576341741_0986_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2018-08-06 23:10:10,191 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1533576341741_0986_01_000001
2018-08-06 23:10:10,259 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /yarn/nm/usercache/dr.who/appcache/application_1533576341741_0986/container_1533576341741_0986_01_000001
2018-08-06 23:10:10,270 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dr.who OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1533576341741_0986 CONTAINERID=container_1533576341741_0986_01_000001
2018-08-06 23:10:10,278 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1533576341741_0986_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
2018-08-06 23:10:10,279 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1533576341741_0986_01_000001 from application application_1533576341741_0986
2018-08-06 23:10:10,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_1533576341741_0986_01_000001 for log-aggregation
2018-08-06 23:10:10,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1533576341741_0986
2018-08-06 23:10:11,287 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1533576341741_0986_01_000001]
2018-08-06 23:10:12,843 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1533576341741_0986_01_000001

 

 

Any one faced similar issue ? or can help me solve it ?

 

Thanks

Announcements