Created on 09-15-2014 08:39 AM - edited 09-16-2022 02:07 AM
We are running Yarn on CDH 5.1 with 14 nodes using 6 GB of memory. I understand this is not a log of memory, but it is all we could put together. Most jobs complete without error, but a few of the larger MapReduce jobs fail with an out of Java heap memory error. The jobs fail on a Reduce task that either sorts or groups data. We recently upgraded to CDH 5.1 from CDH 4.7 and ALL of these jobs succeeded on MapReduce v1. Looking in the logs I see that the Application has retired a few times before failing. Can you see anything wrong with the way the resources are configured?
Java Heap Size of NodeManager in Bytes | 1 GB |
yarn.nodemanager.resource.memory-mb | 6 GB |
yarn.scheduler.minimum-allocation-mb | 1 GB |
yarn.scheduler.maximum-allocation-mb | 6 GB |
yarn.app.mapreduce.am.resource.mb | 1.5 GB |
yarn.nodemanager.container-manager.thread-count | 20 |
yarn.resourcemanager.resource-tracker.client.thread-count | 20 |
mapreduce.map.memory.mb | 1.5 GB |
mapreduce.reduce.memory.mb | 3 GB |
mapreduce.map.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx 1228m"; |
mapreduce.reduce.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx2457m"; |
mapreduce.task.io.sort.factor | 5 |
mapreduce.task.io.sort.mb | 512 MB |
mapreduce.job.reduces | 2 |
mapreduce.reduce.shuffle.parallelcopies | 4 |
One thing that might help, Yarn runs 4 containers per node, can this be reduced?
Created 09-17-2014 10:12 AM
Created 09-16-2014 10:56 AM
Created 09-16-2014 11:42 AM
Here is what I have. Is the map/reduce java opts being overwritten, there are two entries?
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>mapreduce.job.split.metainfo.maxsize</name>
<value>10000000</value>
</property>
<property>
<name>mapreduce.job.counters.max</name>
<value>120</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>false</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>BLOCK</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>zlib.compress.level</name>
<value>DEFAULT_COMPRESSION</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>5</value>
</property>
<property>
<name>mapreduce.map.sort.spill.percent</name>
<value>0.8</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>4</value>
</property>
<property>
<name>mapreduce.task.timeout</name>
<value>600000</value>
</property>
<property>
<name>mapreduce.client.submit.file.replication</name>
<value>4</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>2</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.speculative</name>
<value>false</value>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>false</value>
</property>
<property>
<name>mapreduce.job.reduce.slowstart.completedmaps</name>
<value>0.8</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>blvdevhdp05.ds-iq.corp:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>blvdevhdp05.ds-iq.corp:19888</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.job.ubertask.enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx768m -Xmx825955249</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx1280m -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1280</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1792</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value>
</property>
<property>
<name>mapreduce.shuffle.max.connections</name>
<value>80</value>
</property>
</configuration>
Created 09-16-2014 12:05 PM
Created 09-17-2014 10:12 AM
Created 09-17-2014 11:25 AM
You pointed out the problem and I removed the -Xmx825955249 from where I had entered it in Cloudera Manager. I was using the wrong field to update the value. Thank you so much for sticking with me and helping me resolve this issue! The jobs now succeed!
Kevin Verhoeven