<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question MapReduce application failed with OutOfMemoryError in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/MapReduce-application-failed-with-OutOfMemoryError/m-p/55708#M36628</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hello! since my cluster started to run more Spark and MapReduce jobs than before a high resource consuming MAPREDUCE application crushed randomly.&lt;BR /&gt;The error is always the same (OOME).&lt;BR /&gt;My cluster nodes have 128GB of RAM and 32 cores each, Kerberos and HA for HDFS and Yarn are enabled (we use Yarn instead of MapReduceV1).&lt;/P&gt;&lt;P&gt;App details&lt;BR /&gt;- Total maps: 18922&lt;BR /&gt;- Total reducers: 983&lt;BR /&gt;- Aprox. Allocated CPU vCores: 283&lt;BR /&gt;- Aprox. Allocated Memory MB: 3467264&lt;BR /&gt;&lt;BR /&gt;Below I pasted:&lt;/P&gt;&lt;P&gt;- The log from my History Server.&lt;BR /&gt;- The properties related with memory configuration from my yarn-site.xml, mapred-site.xml and hadoop-env.sh.&lt;BR /&gt;- JAVA per node info.&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;History Server log (from one YARN Node Manager where the app failed)&lt;/P&gt;&lt;P&gt;Log Type: stderr&lt;BR /&gt;Log Upload Time: Sun Jun 11 13:49:24 +0000 2017&lt;BR /&gt;Log Length: 25992448&lt;BR /&gt;Showing 4096 bytes of 25992448 total. Click here for the full log.&lt;BR /&gt;v2.runtime.XMLSerializer.leafElement(XMLSerializer.java:327)&lt;BR /&gt;at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$StringImplImpl.writeLeafElement(RuntimeBuiltinLeafInfoImpl.java:1045)&lt;BR /&gt;at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$StringImplImpl.writeLeafElement(RuntimeBuiltinLeafInfoImpl.java:1024)&lt;BR /&gt;at com.sun.xml.bind.v2.model.impl.RuntimeEnumLeafInfoImpl.writeLeafElement(RuntimeEnumLeafInfoImpl.java:169)&lt;BR /&gt;at com.sun.xml.bind.v2.model.impl.RuntimeEnumLeafInfoImpl.writeLeafElement(RuntimeEnumLeafInfoImpl.java:69)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.writeLeafElement(TransducedAccessor.java:256)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.property.SingleElementLeafProperty.serializeBody(SingleElementLeafProperty.java:128)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:344)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:700)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.property.ArrayElementNodeProperty.serializeItem(ArrayElementNodeProperty.java:69)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.property.ArrayElementProperty.serializeListBody(ArrayElementProperty.java:172)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.property.ArrayERProperty.serializeBody(ArrayERProperty.java:159)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeBody(ClassBeanInfoImpl.java:344)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsSoleContent(XMLSerializer.java:597)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.serializeRoot(ClassBeanInfoImpl.java:328)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsRoot(XMLSerializer.java:498)&lt;BR /&gt;at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:320)&lt;BR /&gt;... 39 more&lt;BR /&gt;Caused by: org.mortbay.jetty.EofException&lt;BR /&gt;at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)&lt;BR /&gt;at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)&lt;BR /&gt;at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)&lt;BR /&gt;at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651)&lt;BR /&gt;at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)&lt;BR /&gt;at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307)&lt;BR /&gt;at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)&lt;BR /&gt;at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)&lt;BR /&gt;at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:146)&lt;BR /&gt;at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134)&lt;BR /&gt;at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)&lt;BR /&gt;at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)&lt;BR /&gt;at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)&lt;BR /&gt;at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)&lt;BR /&gt;at org.codehaus.jackson.impl.WriterBasedGenerator._flushBuffer(WriterBasedGenerator.java:1812)&lt;BR /&gt;at org.codehaus.jackson.impl.WriterBasedGenerator._writeString(WriterBasedGenerator.java:987)&lt;BR /&gt;at org.codehaus.jackson.impl.WriterBasedGenerator.writeString(WriterBasedGenerator.java:448)&lt;BR /&gt;at com.sun.jersey.json.impl.writer.JacksonStringMergingGenerator.flushPreviousString(JacksonStringMergingGenerator.java:311)&lt;BR /&gt;at com.sun.jersey.json.impl.writer.JacksonStringMergingGenerator.writeFieldName(JacksonStringMergingGenerator.java:139)&lt;BR /&gt;at com.sun.jersey.json.impl.writer.Stax2JacksonWriter.writeStartElement(Stax2JacksonWriter.java:183)&lt;BR /&gt;... 58 more&lt;BR /&gt;Caused by: java.io.IOException: Broken pipe&lt;BR /&gt;at sun.nio.ch.FileDispatcherImpl.write0(Native Method)&lt;BR /&gt;at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)&lt;BR /&gt;at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)&lt;BR /&gt;at sun.nio.ch.IOUtil.write(IOUtil.java:65)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:492)&lt;BR /&gt;at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:171)&lt;BR /&gt;at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)&lt;BR /&gt;at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:725)&lt;BR /&gt;... 77 more&lt;BR /&gt;Halting due to Out Of Memory Error...&lt;BR /&gt;Log Type: stdout&lt;BR /&gt;Log Upload Time: Sun Jun 11 13:49:24 +0000 2017&lt;BR /&gt;Log Length: 0&lt;BR /&gt;Log Type: syslog&lt;BR /&gt;Log Upload Time: Sun Jun 11 13:49:24 +0000 2017&lt;BR /&gt;Log Length: 24905123&lt;BR /&gt;Showing 4096 bytes of 24905123 total. Click here for the full log.&lt;BR /&gt;che.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,078 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,079 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,080 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,080 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,081 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1496754929367_0270 (auth:SIMPLE)&lt;BR /&gt;2017-06-11 07:00:14,081 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,082 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,082 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,082 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1496754929367_0270 (auth:SIMPLE)&lt;BR /&gt;2017-06-11 07:00:14,083 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,083 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for job_1496754929367_0270 (auth:TOKEN) for protocol=interface org.apache.hadoop.mapred.TaskUmbilicalProtocol&lt;BR /&gt;2017-06-11 07:00:14,974 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Socket Reader #1 for port 36503,5,main] threw an Error. Shutting down now...&lt;BR /&gt;java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;BR /&gt;at com.google.protobuf.CodedInputStream.&amp;lt;init&amp;gt;(CodedInputStream.java:573)&lt;BR /&gt;at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)&lt;BR /&gt;at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:219)&lt;BR /&gt;at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)&lt;BR /&gt;at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)&lt;BR /&gt;at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)&lt;BR /&gt;at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)&lt;BR /&gt;at com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)&lt;BR /&gt;at com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Connection.decodeProtobufFromStream(Server.java:1994)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1774)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1548)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:647)&lt;BR /&gt;at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:618)&lt;BR /&gt;2017-06-11 07:00:14,980 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;yarn-site.xml&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;yarn.nodemanager.resource.memory-mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;122635&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;yarn.nodemanager.resource.cpu-vcores&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;30&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;100.0&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;yarn.scheduler.maximum-allocation-mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;102400&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.map.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;6144&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.reduce.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;12288&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.child.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;-server -Djava.net.preferIPv4Stack=true&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.map.task.jvm.heap.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;4300&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.reduce.task.jvm.heap.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;8601&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.map.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;${local.child.java.opts} -Xmx${local.map.task.jvm.heap.mb}m&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.reduce.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;${local.child.java.opts} -Xmx${local.reduce.task.jvm.heap.mb}m&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;mapred-site.xml&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.child.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;-server -Djava.net.preferIPv4Stack=true&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.map.task.jvm.heap.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;4300&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;local.reduce.task.jvm.heap.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;8601&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.map.child.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;${local.child.java.opts} -Xmx${local.map.task.jvm.heap.mb}m&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.reduce.child.java.opts&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;${local.child.java.opts} -Xmx${local.reduce.task.jvm.heap.mb}m&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.job.map.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;6144&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.job.reduce.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;12288&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.cluster.max.map.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;15360&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.cluster.max.reduce.memory.mb&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;15360&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.job.reduce.slowstart.completedmaps&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;0.999&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hadoop-env.sh&lt;BR /&gt;&lt;BR /&gt;# Generic command-specific options&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-Xmx8192m" #"&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-Dsyslog.tag=namenode"&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "${JMX_OPTS}=7191"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "-Xmx4096m" #"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "-Dsyslog.tag=datanode"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt HADOOP_DATANODE_OPTS "${JMX_OPTS}=7194"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "-Xmx1024m" #"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "-Dsyslog.tag=journalnode"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt HADOOP_JOURNALNODE_OPTS "${JMX_OPTS}=7203"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "-Xmx1024m" #"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "-Dsyslog.tag=historyserver"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt HADOOP_JOB_HISTORYSERVER_OPTS "${JMX_OPTS}=7201"&lt;/P&gt;&lt;P&gt;add_opt YARN_RESOURCEMANAGER_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt YARN_RESOURCEMANAGER_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt YARN_RESOURCEMANAGER_OPTS "-Xmx1024m" #"&lt;BR /&gt;add_opt YARN_RESOURCEMANAGER_OPTS "-Dsyslog.tag=resourcemanager"&lt;BR /&gt;add_opt YARN_RESOURCEMANAGER_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt YARN_RESOURCEMANAGER_OPTS "${JMX_OPTS}=7204"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "-Xmx1024m" #"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "-Dsyslog.tag=proxyserver"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt YARN_PROXYSERVER_OPTS "${JMX_OPTS}=7202"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "-XX:+UseParNewGC"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "-XX:+UseConcMarkSweepGC"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "-Xmx1024m" #"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "-Dsyslog.tag=nodemanager"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "-Ddaemon.logger=INFO,syslog"&lt;BR /&gt;add_opt YARN_NODEMANAGER_OPTS "${JMX_OPTS}=7205"&lt;BR /&gt;# Specific command-specific options&lt;BR /&gt;add_opt HADOOP_NAMENODE_OPTS "-Dhdfs.audit.logger=INFO,RFAAUDIT"&lt;BR /&gt;add_opt HADOOP_JOBTRACKER_OPTS "-Dmapred.audit.logger=INFO,MRAUDIT"&lt;BR /&gt;add_opt HADOOP_JOBTRACKER_OPTS "-Dmapred.jobsummary.logger=INFO,JSA"&lt;BR /&gt;add_opt HADOOP_TASKTRACKER_OPTS "-Dsecurity.audit.logger=ERROR,console"&lt;BR /&gt;add_opt HADOOP_TASKTRACKER_OPTS "-Dmapred.audit.logger=ERROR,console"&lt;BR /&gt;add_opt HADOOP_SECONDARYNAMENODE_OPTS "-Dhdfs.audit.logger=INFO,RFAAUDIT"&lt;BR /&gt;&lt;BR /&gt;IMPORTANT: There's a file used for the java mapred application where the following properties are set:&lt;BR /&gt;"mapreduce.reduce.slowstart.completed.maps" 0.95,&lt;BR /&gt;"mapreduce.job.reduces" 983,&lt;BR /&gt;"mapreduce.reduce.shuffle.input.buffer.percent" 0.5,&lt;BR /&gt;&lt;BR /&gt;-----------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;Some JAVA settings:&lt;/P&gt;&lt;P&gt;VM Memory Heap results from "$ java -XshowSettings:all"&lt;/P&gt;&lt;P&gt;VM settings:&lt;BR /&gt;Max. Heap Size (Estimated): 26.52G&lt;BR /&gt;Ergonomics Machine Class: server&lt;BR /&gt;Using VM: OpenJDK 64-Bit Server VM&lt;/P&gt;&lt;P&gt;The java processes details from one node randomly selected, at the moment this node is running Reduce tasks for the same app that failed before.&lt;/P&gt;&lt;P&gt;Number of processes = 12&lt;BR /&gt;Memory usage per process = 5493.24 MB&lt;BR /&gt;Total memory usage = 65918.9 MB&lt;/P&gt;&lt;P&gt;From running ps -aux | grep app_id I've got: ........-Xmx8601m.......&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;If you need more details please let me know.&lt;BR /&gt;Thanks!&lt;/P&gt;&lt;P&gt;Guido.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:44:30 GMT</pubDate>
    <dc:creator>gsalerno</dc:creator>
    <dc:date>2022-09-16T11:44:30Z</dc:date>
  </channel>
</rss>

