Reply
Explorer
Posts: 7
Registered: ‎11-16-2016

yarn application issue

hello 

 

We now run three yarn application in our workstation,and one application use python process.

 

but this application will close automatically after running a peroid time.

 

workstation cluster:2namenode(name1,name2),3datanode

 

OS:centos 6.5

 

cloudera version:5.4.3

 

here is the log message:

 

2016-11-15 16:44:57,741 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1476862885544_0090_01_000010 has processes older than 1 iteration running over the configured limit. Limit=1610612736, current usage = 1623642112

2016-11-15 16:44:57,742 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=27837,containerID=container_1476862885544_0090_01_000010] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 8.6 GB of 3.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1476862885544_0090_01_000010 :

    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

    |- 28127 28086 28086 27837 (python) 764 9 850362368 29134 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28117 28086 28086 27837 (python) 765 8 850448384 29061 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28086 27844 28086 27837 (python) 66 18 497029120 23074 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28140 28086 28086 27837 (python) 711 9 850243584 29106 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28114 28086 28086 27837 (python) 766 8 850366464 29138 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28133 28086 28086 27837 (python) 682 7 797290496 28631 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28124 28086 28086 27837 (python) 773 8 850370560 29136 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28130 28086 28086 27837 (python) 801 9 850362368 29134 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 28136 28086 28086 27837 (python) 896 10 850489344 29084 /home/cloudera/anaconda2/bin/python -m pyspark.daemon

    |- 27837 27835 27837 27837 (bash) 1 0 108855296 362 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/hadoop/lib/native::/opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/hadoop/lib/native /usr/java/jdk1.7.0_67-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir=/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0090/container_1476862885544_0090_01_000010/tmp '-Dspark.shuffle.service.port=7337' '-Dspark.driver.port=33538' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1476862885544_0090/container_1476862885544_0090_01_000010 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@name1:33538/user/CoarseGrainedScheduler --executor-id 9 --hostname name1 --cores 1 --app-id application_1476862885544_0090 --user-class-path file:/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0090/container_1476862885544_0090_01_000010/__app__.jar 1> /var/log/hadoop-yarn/container/application_1476862885544_0090/container_1476862885544_0090_01_000010/stdout 2> /var/log/hadoop-yarn/container/application_1476862885544_0090/container_1476862885544_0090_01_000010/stderr

    |- 27844 27837 27837 27837 (java) 2893 240 1884332032 140537 /usr/java/jdk1.7.0_67-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms1024m -Xmx1024m -Djava.io.tmpdir=/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0090/container_1476862885544_0090_01_000010/tmp -Dspark.shuffle.service.port=7337 -Dspark.driver.port=33538 -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1476862885544_0090/container_1476862885544_0090_01_000010 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@name1:33538/user/CoarseGrainedScheduler --executor-id 9 --hostname name1 --cores 1 --app-id application_1476862885544_0090 --user-class-path file:/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0090/container_1476862885544_0090_01_000010/__app__.jar

 

 

 

I think the problem maybe is resourse setting,but I don't know what parameter should I tune in cloudera?

Cloudera Employee
Posts: 281
Registered: ‎01-16-2014

Re: yarn application issue

Your issue is here:

running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB.

Give the container more space to run. You have run out of container space.

 

Since this is a Spark job and you are using pyspark the easiest solution would be to increase the overhead (spark.yarn.executor.memoryOverhead) that is used for this job in calculating the container size.

 

Wilfred

Explorer
Posts: 7
Registered: ‎11-16-2016

Re: yarn application issue

[ Edited ]

Hello Wilfred, thanks for you replay.

 

I modify the spark executor memory from 1G to 3G,but the application also close automatically.

 

 123..jpg.png

 

 

And I find the physical memory size in log message have no change.

 

 

 

the log message is :

 

2016-11-22 08:59:11,232 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1476862885544_0131_01_000002 by user cloudera

2016-11-22 08:59:11,233 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1476862885544_0131

2016-11-22 08:59:11,233 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1476862885544_0131 transitioned from NEW to INITING

2016-11-22 08:59:11,233 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera        IP=192.168.1.12      OPERATION=Start Container Request    TARGET=ContainerManageImpl        RESULT=SUCCESS    APPID=application_1476862885544_0131        CONTAINERID=container_1476862885544_0131_01_000002

2016-11-22 08:59:11,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1476862885544_0131_01_000002 to application application_1476862885544_0131

2016-11-22 08:59:11,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1476862885544_0131 transitioned from INITING to RUNNING

2016-11-22 08:59:11,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1476862885544_0131_01_000002 transitioned from NEW to LOCALIZING

2016-11-22 08:59:11,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1476862885544_0131

2016-11-22 08:59:11,282 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_1476862885544_0131_01_000002

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_INIT for appId application_1476862885544_0131

2016-11-22 08:59:11,282 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing application application_1476862885544_0131

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/onlineSimWithMultiline.py transitioned from INIT to DOWNLOADING

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/log4j.properties transitioned from INIT to DOWNLOADING

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/P211C transitioned from INIT to DOWNLOADING

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/P211D transitioned from INIT to DOWNLOADING

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/jumplib.zip transitioned from INIT to DOWNLOADING

2016-11-22 08:59:11,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1476862885544_0131_01_000002

2016-11-22 08:59:11,283 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /mnt/disk/yarn/nm/nmPrivate/container_1476862885544_0131_01_000002.tokens. Credentials list:

2016-11-22 08:59:11,285 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /mnt/disk/yarn/nm/nmPrivate/container_1476862885544_0131_01_000002.tokens to /mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131/container_1476862885544_0131_01_000002.tokens

2016-11-22 08:59:11,285 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131 = file:/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131

2016-11-22 08:59:11,319 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/onlineSimWithMultiline.py(->/mnt/disk/yarn/nm/usercache/cloudera/filecache/6547/onlineSimWithMultiline.py) transitioned from DOWNLOADING to LOCALIZED

2016-11-22 08:59:11,334 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/log4j.properties(->/mnt/disk/yarn/nm/usercache/cloudera/filecache/6548/log4j.properties) transitioned from DOWNLOADING to LOCALIZED

2016-11-22 08:59:11,350 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/P211C(->/mnt/disk/yarn/nm/usercache/cloudera/filecache/6549/P211C) transitioned from DOWNLOADING to LOCALIZED

2016-11-22 08:59:11,365 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/P211D(->/mnt/disk/yarn/nm/usercache/cloudera/filecache/6550/P211D) transitioned from DOWNLOADING to LOCALIZED

2016-11-22 08:59:11,380 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/cloudera/.sparkStaging/application_1476862885544_0131/jumplib.zip(->/mnt/disk/yarn/nm/usercache/cloudera/filecache/6551/jumplib.zip) transitioned from DOWNLOADING to LOCALIZED

2016-11-22 08:59:11,380 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1476862885544_0131_01_000002 transitioned from LOCALIZING to LOCALIZED

2016-11-22 08:59:11,397 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1476862885544_0131_01_000002 transitioned from LOCALIZED to RUNNING

2016-11-22 08:59:11,400 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131/container_1476862885544_0131_01_000002/default_container_executor.sh]

2016-11-22 08:59:11,479 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1476862885544_0131_01_000002

2016-11-22 08:59:11,549 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 27.2 MB of 1.5 GB physical memory used; 1.7 GB of 3.1 GB virtual memory used

2016-11-22 08:59:14,326 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockManager: Registered executor AppExecId{appId=application_1476862885544_0131, execId=1} with ExecutorShuffleInfo{localDirs=[/mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131/blockmgr-b26818c7-4491-433a-b96b-3e31c7ee0ae6], subDirsPerLocalDir=64, shuffleManager=org.apache.spark.shuffle.sor

2016-11-22 08:59:14,864 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 371.7 MB of 1.5 GB physical memory used; 1.8 GB of 3.1 GB virtual memory used
2016-11-22 08:59:18,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 584.4 MB of 1.5 GB physical memory used; 2.9 GB of 3.1 GB virtual memory used
2016-11-22 08:59:21,530 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 675.1 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:24,801 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.5 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:28,066 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.5 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:31,332 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:34,638 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:37,962 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:41,291 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:44,622 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:47,935 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:51,207 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:54,488 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 08:59:57,740 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:01,015 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:04,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:07,576 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.6 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:10,879 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.8 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:14,181 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 673.9 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:17,454 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 674.0 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:20,744 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 674.1 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:23,996 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 674.2 MB of 1.5 GB physical memory used; 3.0 GB of 3.1 GB virtual memory used
2016-11-22 09:00:27,274 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 20949 for container-id container_1476862885544_0131_01_000002: 568.0 MB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used

                                                             .

                                                             .

                                                             .

                                                             .

                                                             .

2016-11-22 23:33:07,028 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1476862885544_0131_01_000002 is : 1
2016-11-22 23:33:07,028 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1476862885544_0131_01_000002 and exit code: 1
2016-11-22 23:33:07,028 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1476862885544_0131_01_000002
2016-11-22 23:33:07,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1476862885544_0131_01_000002 transitioned from RUNNING to EXITED_WITH_FAILURE
2016-11-22 23:33:07,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1476862885544_0131_01_000002
2016-11-22 23:33:07,047 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131/container_1476862885544_0131_01_000002
2016-11-22 23:33:07,047 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1476862885544_0131 CONTAINERID=container_1476862885544_0131_01_000002
2016-11-22 23:33:07,047 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1476862885544_0131_01_000002 transitioned from EXITED_WITH_FAILURE to DONE
2016-11-22 23:33:07,047 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1476862885544_0131_01_000002 from application application_1476862885544_0131
2016-11-22 23:33:07,048 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_1476862885544_0131_01_000002 for log-aggregation
2016-11-22 23:33:07,048 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1476862885544_0131
2016-11-22 23:33:07,048 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_1476862885544_0131_01_000002
2016-11-22 23:33:08,337 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1476862885544_0131 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2016-11-22 23:33:08,338 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /mnt/disk/yarn/nm/usercache/cloudera/appcache/application_1476862885544_0131
2016-11-22 23:33:08,338 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1476862885544_0131
2016-11-22 23:33:08,338 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application application_1476862885544_0131
2016-11-22 23:33:08,338 INFO org.apache.spark.network.shuffle.ExternalShuffleBlockManager: Application application_1476862885544_0131 removed, cleanupLocalDirs = false
2016-11-22 23:33:08,338 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1476862885544_0131 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2016-11-22 23:33:08,338 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Application just finished : application_1476862885544_0131
2016-11-22 23:33:08,381 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_1476862885544_0131_01_000002. Current good log dirs are /var/log/hadoop-yarn/container
2016-11-22 23:33:08,386 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /var/log/hadoop-yarn/container/application_1476862885544_0131/container_1476862885544_0131_01_000002/stderr
2016-11-22 23:33:08,389 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /var/log/hadoop-yarn/container/application_1476862885544_0131/container_1476862885544_0131_01_000002/stdout
2016-11-22 23:33:08,462 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /var/log/hadoop-yarn/container/application_1476862885544_0131
2016-11-22 23:33:09,555 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1476862885544_0131_01_000002

 

 

 

The error log is "Exit code from container container_1476862885544_0131_01_000002 is : 1",but i don't know what "clode 1" mean?

Cloudera Employee
Posts: 281
Registered: ‎01-16-2014

Re: yarn application issue

Look at the container log for the specific container "container_1476862885544_0131_01_000002" mentioned in the log. You are looking at NM logs, that will only tell you the general life cycle of the container. You need to look at what happens inside the container.

Use the RM web UI and fins the application: application_1476862885544_0131 and drill into the containers that are run for the application.

 

Wilfred

Highlighted
Explorer
Posts: 7
Registered: ‎11-16-2016

Re: yarn application issue

Hello , Wilfred

 

I find the container log file will remove if the application fail or deleted.

 

Is there any method to save the log file when application fail?

 

 

Cloudera Employee
Posts: 281
Registered: ‎01-16-2014

Re: yarn application issue

Since you are using the CDH distribution I hope you are using Cloudera Manager, that makes it much easier to manage and most things should have been configured for you.

 

You can access applications via the RM web UI there it should have links though to the container logs also.

For Spark applications you should also have the Spark History Server up and running. You should be able to find the application there and links through to the container logs.

 

Logs are kept for days even for failed and finished applications. If you have log aggregation turned on you can use the yarn command to load the logs from HDFS. Otherwise the logs are kept on the local drives of the NodeManagers and you can check there. 

 

Wilfred

 

 

Announcements