Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn Application failed on out of memory

avatar
Master Collaborator

Hi,

 

I have a mapper reduce job failed on out of memory.

Log:

 

Application application_1484466365663_87038 failed 2 times due to AM Container for appattempt_1484466365663_87038_000002 exited with exitCode: -104
Diagnostics: Container [pid=7448,containerID=container_e29_1484466365663_87038_02_000001] is running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 6.6 GB of 6.3 GB virtual memory used. Killing container.
Dump of the process-tree for container_e29_1484466365663_87038_02_000001 :

 

When i'm checking the memory configured for map task and for Application master in cloudera manager it's 2 GB.

 

 

Checked the job configuration in YARN and see it's 2 GB.

 

mapreduce.map.memory.mb = 2GB

I have 2 question:

 

1- How i  know if this container is the AM container or the mapper container, does the above error indicated the AM memory exceeded?

 

2- Why it's alerting on 3GB while all my configuration is 2 GB.

 

The solution is clear for me that i need to increase the memory.

8 REPLIES 8

avatar
Champion
Track down container container_e29_1484466365663_87038_02_000001. It is most likely a reducer. I say that since you said both the Map and AM container size was set to 2 GB. Therefor the Reduce container size must be 3 GB. Well, in theory the user launching it could have overridden any of them.

What is the value of mapreduce.reduce.memory.mb?

Lets try another route as well, in the RM UI, in the job in question, does it have any failed maps or reducers? If yes, drill down to the failed one and view the logs. If not, then the AM container OOM'd.

From my recollection though, that is the line the AM logs concerning one of the containers it is responsible for.

Anyway, the short of it is, either the Reduce container size is 3 GB or the user set their own value to 3 GB as the values in the cluster configs are only the defaults.

avatar
Master Collaborator
The job is a cleaner job which running with only 1 mapper, and it's oozie
launcher, Does the default for the oozie launcher is different from the job?

oozie:launcher:T=java:W=hdfs-cleaner-wf:A=hdfs-cleaner:ID=0568638-160809023957851-oozie-clou-W

More piece of the log:


Application application_1484466365663_87038 failed 2 times due to AM
Container for appattempt_1484466365663_87038_000002 exited with exitCode:
-104
For more detailed output, check application tracking page:
http://avor-mhc102.lpdomain.com:8088/proxy/application_1484466365663_87038/Then,
click on links to logs of each attempt.
Diagnostics: Container
[pid=7448,containerID=container_e29_1484466365663_87038_02_000001] is
running beyond physical memory limits. Current usage: 3.0 GB of 3 GB
physical memory used; 6.6 GB of 6.3 GB virtual memory used. Killing
container.
Dump of the process-tree for container_e29_1484466365663_87038_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 7448 7446 7448 7448 (bash) 2 2 108650496 304 /bin/bash -c
/jdk8//bin/java -Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=//hadoop/log/hadoop-yarn/container/application_1484466365663_87038/container_e29_1484466365663_87038_02_000001
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Djava.net.preferIPv4Stack=true -Xmx825955249
-Djava.net.preferIPv4Stack=true -Xmx4096m -Xmx4608m -Djava.io.tmpdir=./tmp
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/hadoop/log/hadoop-yarn/container/application_1484466365663_87038/container_e29_1484466365663_87038_02_000001/stdout
2>/hadoop/log/hadoop-yarn/container/application_1484466365663_87038/container_e29_1484466365663_87038_02_000001/stderr
|- 7613 7448 7448 7448 (java) 22034 2726 6976090112 788011 /jdk8//bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/hadoop/log/hadoop-yarn/container/application_1484466365663_87038/container_e29_1484466365663_87038_02_000001
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Djava.net.preferIPv4Stack=true -Xmx825955249
-Djava.net.preferIPv4Stack=true -Xmx4096m -Xmx4608m -Djava.io.tmpdir=./tmp
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.




Maps Total: 1

-
- Total Tasks: 1
-



avatar
Champion
I'm not terrible familiar with Oozie but I believe the launcher was desperately from the actual job.

Also, from the log "-Xmx4096m -Xmx4608m" it is launching with 4 GB container size and the heap is set to 3 GB.

Is it set in the Oozie job settings?

avatar
Master Collaborator
Yes, in the oozie it's 4GB, you are right

com.hadoop.platform.cleaner.CleanerJob
-Xmx4096m

avatar
Master Collaborator

My concern why it's alerting on 3GB of memory and not the mapper memory which is 6GB or the oozie launcher which is 4GB also is it alerting on mapper memory or the application master memory?

avatar
Champion

The map container memory was set to 4 GB.  Presumably the heap value was set to 3 GB (newer versions have a percentage and auto set the heap size of the container and the default percentage is 80%; 3/4 is 75%).  The 6 GB comes from virtual memory, which I recommend just disabling as it can cause weird OOM issues.  The default virtual memory ration is 2.1 which doesn't come out to 6 from 4.  The log even states that the latter is the virtual memory size.

 

yarn.nodemanager.vmem-check-enabled = false to disable.

avatar
New Contributor

How I can disable `yarn.nodemanager.vmem-check-enabled` I try to add to `NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml` but I don't see it in the yarn-site.xml on the nodes.

avatar
Super Collaborator
vmem checks have been disabled in CDH almost since their introduction. The vmem check is not stable and highly dependent on Linux version and distro. If you run CDH you are already running with it disabled.

Wilfred