Created 07-18-2016 08:43 AM
Hi,
I'm trying to configure the optimum memory configuration in YARN to implement some MR tasks in R. For the moment, I have a single node with around 40GB RAM available. I have tried different memory combinations but all of them result in Java Heap Space exceptions when trying to execute a simple MR R code (using the library plyrmr) to process a small (a few KB size) text file. The relevant memory configuration parameters I have so far (in yarn-site.xml and map-red.xml) are:
yarn.scheduler.maximum-allocation-mb = 24576 yarn.scheduler.minimum-allocation-mb = 3076 yarn.app.mapreduce.am.resource.mb = 3076 mapreduce.map.java.opts = -Xmx2457m mapreduce.map.memory.mb = 3072 mapreduce.reduce.java.opts = -Xmx4915m mapreduce.reduce.memory.mb = 6144
Is there any other memory configuration parameter that needs to be set or adjusted? After launching the task, 2 split jobs are created and a Java Heap Space exception is raised. Looking through the YARN logs of the application that raises the exception, I stumple upon the following line after executing the launch_container.sh:
exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -Xmx400M
What are these "400 MB" of Java Space for? I have checked a lot of different configuration files but I couldn't find any parameter related to these 400MB of space. Is there any other Java parameter that needs to be set in the aforementioned list of configuration properties?
The relevant log part of the MR task is:
INFO mapreduce.Job: Counters: 17 Job Counters Failed map tasks=7 Killed map tasks=1 Killed reduce tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=37110 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=37110 Total time spent by all reduce tasks (ms)=0 Total vcore-seconds taken by all map tasks=37110 Total vcore-seconds taken by all reduce tasks=0 Total megabyte-seconds taken by all map tasks=114001920 Total megabyte-seconds taken by all reduce tasks=0 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0
Is there anything that I'm missing?
Thanks a lot for your time.
Created 07-18-2016 04:31 PM
Hi @Jaime
I think it is the namenode total java max heap size.
Please go through this settings (You have to change it from HDFS config):
Please let us if this fixes the problem.
Thanks
Created 07-19-2016 06:58 AM
Hi @rbiswas,
Thanks for your comment. I didn't know how to adjust that memory parameter. However, looking in hadoop-env.sh I discovered that it was set at 1024MB:
export HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms1024m"
Unfortunately, that didn't solve the problem.
Created 07-19-2016 07:53 AM
Hi again @rbiswas,
Up to my knowledge, each time a mapper (or reducer) is created, the ApplicationMaster will request the NodeManager to allocate a new Container with mapreduce.map.memory.mb (and mapreduce.reduce.memory.mb) MBytes available. So, with my specific configuration, if three mappers are created, then, YARN will try to create three containers with 3072 MB each. Am I right?
If so, what if YARN can't reserve (3*3072MB)? Will it raise a Java Heap Space Exception?
Thanks in advance.
Created 07-18-2018 11:40 AM
Have you found the solution???
I am looking for solution for my use-case, please follow the link,
https://community.hortonworks.com/questions/203537/container-allocation-by-application-master-in-had...