Created on 09-15-2014 08:39 AM - edited 09-16-2022 02:07 AM
We are running Yarn on CDH 5.1 with 14 nodes using 6 GB of memory. I understand this is not a log of memory, but it is all we could put together. Most jobs complete without error, but a few of the larger MapReduce jobs fail with an out of Java heap memory error. The jobs fail on a Reduce task that either sorts or groups data. We recently upgraded to CDH 5.1 from CDH 4.7 and ALL of these jobs succeeded on MapReduce v1. Looking in the logs I see that the Application has retired a few times before failing. Can you see anything wrong with the way the resources are configured?
Java Heap Size of NodeManager in Bytes | 1 GB |
yarn.nodemanager.resource.memory-mb | 6 GB |
yarn.scheduler.minimum-allocation-mb | 1 GB |
yarn.scheduler.maximum-allocation-mb | 6 GB |
yarn.app.mapreduce.am.resource.mb | 1.5 GB |
yarn.nodemanager.container-manager.thread-count | 20 |
yarn.resourcemanager.resource-tracker.client.thread-count | 20 |
mapreduce.map.memory.mb | 1.5 GB |
mapreduce.reduce.memory.mb | 3 GB |
mapreduce.map.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx 1228m"; |
mapreduce.reduce.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx2457m"; |
mapreduce.task.io.sort.factor | 5 |
mapreduce.task.io.sort.mb | 512 MB |
mapreduce.job.reduces | 2 |
mapreduce.reduce.shuffle.parallelcopies | 4 |
One thing that might help, Yarn runs 4 containers per node, can this be reduced?
Created 09-17-2014 10:12 AM
Created 09-15-2014 09:33 AM
What are your MR1 settings? Do reducers used to get -Xmx2457m on MR1?
Also, the AM memory at 1.5GB is a bit high. You could probably cut that to 1GB.
Created 09-15-2014 10:30 AM
Thanks bcwalrus, very good question:
In MRv1, we configured the Java Heap Size of TaskTracker in Bytes with: 600 MB. Do you think I've set this too high in MRv2?
I'll cut the AM memory down to 1 GB, that is good advice. That will save me some memory on the node.
Kevin
Created 09-15-2014 05:25 PM
Created 09-16-2014 08:32 AM
Created 09-16-2014 09:11 AM
Created 09-16-2014 09:36 AM
Created 09-16-2014 10:23 AM
From the Yarn logs I can see that Yarn believes that a huge amount of virtual memory is available before the job is killed, why is it using so much Virtual memory? Where is this set?
2014-09-16 10:18:30,803 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 51870 for container-id container_1410882800578_0001_01_000001: 797.0 MB of 2.5 GB physical memory used; 1.8 GB of 5.3 GB virtual memory used 2014-09-16 10:18:33,829 INFO
...
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1410882800578_0005_01_000048 2014-09-16 10:18:34,431 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=admin IP=192.168.210.251 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1410882800578_0005 CONTAINERID=container_1410882800578_0005_01_000048 2014-09-16 10:18:34,432 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1410882800578_0005_01_000048 transitioned from RUNNING to KILLING 2014-09-16 10:18:34,433 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1410882800578_0005_01_000048 2014-09-16 10:18:34,462 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1410882800578_0005_01_000048 is : 143 2014-09-16 10:18:34,550 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1410882800578_0005_01_000048 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2014-09-16 10:18:34,553 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /space1/yarn/nm/usercache/admin/appcache/application_1410882800578_0005/container_1410882800578_0005_01_000048 2014-09-16 10:18:34,556 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /space2/yarn/nm/usercache/admin/appcache/application_1410882800578_0005/container_1410882800578_0005_01_000048 2014-09-16 10:18:34,558 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=admin OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1410882800578_0005 CONTAINERID=container_1410882800578_0005_01_000048
Created 09-16-2014 10:31 AM
Created 09-16-2014 10:38 AM