- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Jobs fail in Yarn with out of Java heap memory error
Created on 09-15-2014 08:39 AM - edited 09-16-2022 02:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are running Yarn on CDH 5.1 with 14 nodes using 6 GB of memory. I understand this is not a log of memory, but it is all we could put together. Most jobs complete without error, but a few of the larger MapReduce jobs fail with an out of Java heap memory error. The jobs fail on a Reduce task that either sorts or groups data. We recently upgraded to CDH 5.1 from CDH 4.7 and ALL of these jobs succeeded on MapReduce v1. Looking in the logs I see that the Application has retired a few times before failing. Can you see anything wrong with the way the resources are configured?
Java Heap Size of NodeManager in Bytes | 1 GB |
yarn.nodemanager.resource.memory-mb | 6 GB |
yarn.scheduler.minimum-allocation-mb | 1 GB |
yarn.scheduler.maximum-allocation-mb | 6 GB |
yarn.app.mapreduce.am.resource.mb | 1.5 GB |
yarn.nodemanager.container-manager.thread-count | 20 |
yarn.resourcemanager.resource-tracker.client.thread-count | 20 |
mapreduce.map.memory.mb | 1.5 GB |
mapreduce.reduce.memory.mb | 3 GB |
mapreduce.map.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx 1228m"; |
mapreduce.reduce.java.opts | "-Djava.net.preferIPv4Stack=true -Xmx2457m"; |
mapreduce.task.io.sort.factor | 5 |
mapreduce.task.io.sort.mb | 512 MB |
mapreduce.job.reduces | 2 |
mapreduce.reduce.shuffle.parallelcopies | 4 |
One thing that might help, Yarn runs 4 containers per node, can this be reduced?
Created 09-17-2014 10:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<value>-Djava.net.preferIPv4Stack=true -Xmx1280m -Xmx825955249</value>
Limits the heap to ~825MB. Most JVMs resolve duplicate args by picking the last one. So this is nowhere close to the 3GB that you intended. You should find out where you set this in CM and change it.
Do that before you play with parallelcopies. But to answer your questions, yes, it'll increase CPU, memory & network usage. And it could lead to more disk spills and slow down your job.
Created 09-16-2014 10:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 09-16-2014 11:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is what I have. Is the map/reduce java opts being overwritten, there are two entries?
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>mapreduce.job.split.metainfo.maxsize</name>
<value>10000000</value>
</property>
<property>
<name>mapreduce.job.counters.max</name>
<value>120</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>false</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>BLOCK</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>zlib.compress.level</name>
<value>DEFAULT_COMPRESSION</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>5</value>
</property>
<property>
<name>mapreduce.map.sort.spill.percent</name>
<value>0.8</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>4</value>
</property>
<property>
<name>mapreduce.task.timeout</name>
<value>600000</value>
</property>
<property>
<name>mapreduce.client.submit.file.replication</name>
<value>4</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>2</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.speculative</name>
<value>false</value>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>false</value>
</property>
<property>
<name>mapreduce.job.reduce.slowstart.completedmaps</name>
<value>0.8</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>blvdevhdp05.ds-iq.corp:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>blvdevhdp05.ds-iq.corp:19888</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.job.ubertask.enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx768m -Xmx825955249</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Djava.net.preferIPv4Stack=true -Xmx1280m -Xmx825955249</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1280</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1792</value>
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH</value>
</property>
<property>
<name>mapreduce.shuffle.max.connections</name>
<value>80</value>
</property>
</configuration>
Created 09-16-2014 12:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems to me that if this is increased that data is written quickly to files and out of memory. But I might be wrong...
Created 09-17-2014 10:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
<value>-Djava.net.preferIPv4Stack=true -Xmx1280m -Xmx825955249</value>
Limits the heap to ~825MB. Most JVMs resolve duplicate args by picking the last one. So this is nowhere close to the 3GB that you intended. You should find out where you set this in CM and change it.
Do that before you play with parallelcopies. But to answer your questions, yes, it'll increase CPU, memory & network usage. And it could lead to more disk spills and slow down your job.
Created 09-17-2014 11:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You pointed out the problem and I removed the -Xmx825955249 from where I had entered it in Cloudera Manager. I was using the wrong field to update the value. Thank you so much for sticking with me and helping me resolve this issue! The jobs now succeed!
Kevin Verhoeven

- « Previous
-
- 1
- 2
- Next »