Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Node managers in stopped state

avatar
Explorer

All nodemanagers go into stopped state within a couple of seconds after starting up.The nodemanager status remains active after manually starting up but still remains in stopped state.All jobs remain in accepted state.

I find the following error in nodemanager logs

2018-07-23 17:23:28,988 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0069_01_000001
java.io.IOException: Timeout while waiting for exit code from container_e101_1532344242009_0069_01_000001
	at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:205)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
2018-07-23 17:23:28,989 WARN  launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(106)) - Recovered container exited with a non-zero exit code 154
2018-07-23 17:23:28,991 INFO  container.ContainerImpl (ContainerImpl.java:handle(1136)) - Container container_e101_1532344242009_0069_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2018-07-23 17:23:28,991 INFO  launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(371)) - Cleaning up container container_e101_1532344242009_0069_01_000001
2018-07-23 17:23:29,006 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0071_01_000001
1 ACCEPTED SOLUTION

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
New Contributor

https://community.hortonworks.com/questions/66523/yarn-node-manager-not-starting.html

If nodemanager.recovery.enabled is set to true, set it to false. (If turning off recovery is fine for you)