Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN memory configuration parameters and Java Heap Space.

avatar
Contributor

Hi,

I'm trying to configure the optimum memory configuration in YARN to implement some MR tasks in R. For the moment, I have a single node with around 40GB RAM available. I have tried different memory combinations but all of them result in Java Heap Space exceptions when trying to execute a simple MR R code (using the library plyrmr) to process a small (a few KB size) text file. The relevant memory configuration parameters I have so far (in yarn-site.xml and map-red.xml) are:

yarn.scheduler.maximum-allocation-mb = 24576
yarn.scheduler.minimum-allocation-mb = 3076
yarn.app.mapreduce.am.resource.mb = 3076
mapreduce.map.java.opts = -Xmx2457m
mapreduce.map.memory.mb = 3072
mapreduce.reduce.java.opts = -Xmx4915m
mapreduce.reduce.memory.mb = 6144

Is there any other memory configuration parameter that needs to be set or adjusted? After launching the task, 2 split jobs are created and a Java Heap Space exception is raised. Looking through the YARN logs of the application that raises the exception, I stumple upon the following line after executing the launch_container.sh:

exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -Xmx400M 

What are these "400 MB" of Java Space for? I have checked a lot of different configuration files but I couldn't find any parameter related to these 400MB of space. Is there any other Java parameter that needs to be set in the aforementioned list of configuration properties?

The relevant log part of the MR task is:

INFO mapreduce.Job: Counters: 17
	Job Counters 
		Failed map tasks=7
		Killed map tasks=1
		Killed reduce tasks=1
		Launched map tasks=8
		Other local map tasks=6
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=37110
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=37110
		Total time spent by all reduce tasks (ms)=0
		Total vcore-seconds taken by all map tasks=37110
		Total vcore-seconds taken by all reduce tasks=0
		Total megabyte-seconds taken by all map tasks=114001920
		Total megabyte-seconds taken by all reduce tasks=0
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0

Is there anything that I'm missing?

Thanks a lot for your time.

4 REPLIES 4

avatar

Hi @Jaime

I think it is the namenode total java max heap size.

Please go through this settings (You have to change it from HDFS config):

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-809...

Please let us if this fixes the problem.

Thanks

avatar
Contributor

Hi @rbiswas,

Thanks for your comment. I didn't know how to adjust that memory parameter. However, looking in hadoop-env.sh I discovered that it was set at 1024MB:

export HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms1024m"

Unfortunately, that didn't solve the problem.

avatar
Contributor

Hi again @rbiswas,

Up to my knowledge, each time a mapper (or reducer) is created, the ApplicationMaster will request the NodeManager to allocate a new Container with mapreduce.map.memory.mb (and mapreduce.reduce.memory.mb) MBytes available. So, with my specific configuration, if three mappers are created, then, YARN will try to create three containers with 3072 MB each. Am I right?

If so, what if YARN can't reserve (3*3072MB)? Will it raise a Java Heap Space Exception?

Thanks in advance.

avatar
Explorer