Support Questions

Find answers, ask questions, and share your expertise

container exception Exit code: 127

avatar

i'm facing frequent container exception on particular cluster but same set of jobs ran fine in other cluster.

Exception from container-launch.

Container id: container_e64_1481762217559_27152_01_000002

Exit code: 127

Stack trace: ExitCodeException exitCode=127:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)

at org.apache.hadoop.util.Shell.run(Shell.java:487)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)

at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1

main : run as user is xyz

main : requested yarn user is xyz

6 REPLIES 6

avatar

exit code 127 usually refers user application specific issues.

https://issues.apache.org/jira/browse/YARN-3704

avatar

Also share if you are seeing any application specific exceptions apart from this.

avatar
@gsharm i dont see anything related to application specific error messages ... all i could see is below error messages.
2016-12-28 08:28:12,219 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1481762217559_27152_m_000000_1: Exception from container-launch.
Container id: container_e64_1481762217559_27152_01_000007
Exit code: 127
Stack trace: ExitCodeException exitCode=127: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
	at org.apache.hadoop.util.Shell.run(Shell.java:487)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

avatar
am seeing few thread messages too..


INFO [Thread-55] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node xyz_123.com

2016-12-28 08:28:34,579 INFO [Thread-77] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2016-12-28 08:28:34,582 INFO [Thread-77] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Task failed task_1481762217559_27152_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

avatar

also am seeing below messages from nodemanager

2016-12-28 08:28:26,005 INFO containermanager.AuxServices (AuxServices.java:handle(196)) - Got event CONTAINER_STOP for appId application_1481762217559_27152

2016-12-28 08:28:26,005 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(189)) - Stopping container container_e64_1481762217559_27152_01_000008

2016-12-28 08:28:26,481 INFO ipc.Server (Server.java:saslProcess(1441)) - Auth successful for appattempt_1481762217559_27152_000001 (auth:SIMPLE)

2016-12-28 08:28:26,491 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(135)) - Authorization successful for appattempt_1481762217559_27152_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB

2016-12-28 08:28:26,491 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:stopContainerInternal(966)) - Stopping container with container Id: container_e64_1481762217559_27152_01_000008

2016-12-28 08:28:26,492 INFO nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=ajamwal IP=10.246.73.94 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1481762217559_27152 CONTAINERID=container_e64_1481762217559_27152_01_000008

2016-12-28 08:28:26,817 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(674)) - Unknown localizer with localizerId container_e64_1481762217559_27152_01_000008 is sending heartbeat. Ordering it to DIE

2016-12-28 08:28:26,818 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(674)) - Unknown localizer with localizerId container_e64_1481762217559_27152_01_000008 is sending heartbeat. Ordering it to DIE

2016-12-28 08:28:27,227 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1131)) - Localizer failed

java.io.IOException: java.lang.InterruptedException

at org.apache.hadoop.util.Shell.runCommand(Shell.java:579)

at org.apache.hadoop.util.Shell.run(Shell.java:487)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)

at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:258)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)

2016-12-28 08:28:28,016 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(529)) - Removed completed containers from NM context: [container_e64_1481762217559_27152_01_000008]

avatar

Exception from container-launch. Container id: container_e64_1481762217559_27152_01_000002 Exit code: 127 Stack trace: ExitCodeException exitCode=127: at org.apache.hadoop.util.Shell.runCommand(Shell.java:576) at org.apache.hadoop.util.Shell.run(Shell.java:487) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : run as user is ajamwal main : requested yarn user is ajamwal Container exited with a non-zero exit code 127