Member since
04-07-2017
3
Posts
0
Kudos Received
0
Solutions
04-10-2017
05:36 PM
Greetings, We are running a 10-datanode HDP v2.5 cluster on Ubuntu 14.04 installed by ambari. Whenever I run a large yarn job I get the following result: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 I'm not sure what is causing this problem. Can someone help troubleshoot? Here is the yarn-yarn-nodemanager-datanode1.log: 2017-04-03 10:15:18,140 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(810)) - Start request for container_e10_1484675915702_18333_01_000003 by user root
2017-04-03 10:15:18,151 INFO application.ApplicationImpl (ApplicationImpl.java:transition(304)) - Adding container_e10_1484675915702_18333_01_000003 to application application_1484675915702_18333
2017-04-03 10:15:18,153 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from NEW to LOCALIZING
2017-04-03 10:15:18,157 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(184)) - Initializing container container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:18,157 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(185)) - Initializing container container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:18,358 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(712)) - Created localizer for container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:18,406 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1194)) - Writing credentials to the nmPrivate file /grid/3/hadoop/yarn/local/nmPrivate/container_e10_1484675915702_18333_01_000003.tokens. Credentials list:
2017-04-03 10:15:18,407 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from LOCALIZING to LOCALIZED
2017-04-03 10:15:18,458 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from LOCALIZED to RUNNING
2017-04-03 10:15:18,462 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:buildCommandExecutor(281)) - launchContainer: [bash, /grid/1/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/default_container_executor.sh]
2017-04-03 10:15:18,465 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(126)) - Copying from /grid/3/hadoop/yarn/local/nmPrivate/container_e10_1484675915702_18333_01_000003.tokens to /grid/2/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003.tokens
2017-04-03 10:15:20,998 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(375)) - Starting resource-monitoring for container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:21,144 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 851 for container-id container_e10_1484675915702_18333_01_000003: 148.7 MB of 2 GB physical memory used; 2.1 GB of 4.2 GB virtual memory used
2017-04-03 10:15:24,293 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(464)) - Memory usage of ProcessTree 851 for container-id container_e10_1484675915702_18333_01_000003: 305.4 MB of 2 GB physical memory used; 2.4 GB of 4.2 GB virtual memory used
2017-04-03 10:15:24,734 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:stopContainerInternal(960)) - Stopping container with container Id: container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,734 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from RUNNING to KILLING
2017-04-03 10:15:24,734 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(425)) - Cleaning up container container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,743 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(237)) - Exit code from container container_e10_1484675915702_18333_01_000003 is : 143
2017-04-03 10:15:24,756 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2017-04-03 10:15:24,757 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(480)) - Deleting absolute path : /grid/1/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,757 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(480)) - Deleting absolute path : /grid/2/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,757 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(480)) - Deleting absolute path : /grid/3/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,757 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(480)) - Deleting absolute path : /grid/0/hadoop/yarn/local/usercache/root/appcache/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,757 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e10_1484675915702_18333_01_000003 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2017-04-03 10:15:24,757 INFO application.ApplicationImpl (ApplicationImpl.java:transition(347)) - Removing container_e10_1484675915702_18333_01_000003 from application application_1484675915702_18333
2017-04-03 10:15:24,757 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:startContainerLogAggregation(512)) - Considering container container_e10_1484675915702_18333_01_000003 for log-aggregation
2017-04-03 10:15:24,758 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(190)) - Stopping container container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:24,758 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(191)) - Stopping container container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:26,338 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(553)) - Removed completed containers from NM context: [container_e10_1484675915702_18333_01_000003]
2017-04-03 10:15:27,294 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(390)) - Stopping resource-monitoring for container_e10_1484675915702_18333_01_000003
2017-04-03 10:15:34,491 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(567)) - Uploading logs for container container_e10_1484675915702_18333_01_000003. Current good log dirs are /grid/1/hadoop/yarn/log,/grid/2/hadoop/yarn/log,/grid/3/hadoop/yarn/log,/grid/0/hadoop/yarn/log
2017-04-03 10:15:34,495 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(489)) - Deleting path : /grid/1/hadoop/yarn/log/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/syslog
2017-04-03 10:15:34,496 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(489)) - Deleting path : /grid/1/hadoop/yarn/log/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/directory.info
2017-04-03 10:15:34,496 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(489)) - Deleting path : /grid/1/hadoop/yarn/log/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/stdout
2017-04-03 10:15:34,496 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(489)) - Deleting path : /grid/1/hadoop/yarn/log/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/stderr
2017-04-03 10:15:34,496 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(489)) - Deleting path : /grid/1/hadoop/yarn/log/application_1484675915702_18333/container_e10_1484675915702_18333_01_000003/launch_container.sh
... View more
Labels:
04-07-2017
08:55 PM
Thank you! I've pulled most of my hair out researching this problem. I'm thankful you posted a workaround!
... View more