I am running into an issue when running a mapreduce job and the Hadoop env is running out of memory. It appears the way yarn is configured is the values are higher than the recommended setting per Hortonworks and allocated a large amount of RAM when the containers are created. My data input file is over 600 million rows. It appears the containers and reserved sys memory should be reduced.
We ran this script to identify the resources allocated and the return values are shown below:
python yarn-utils.py -c 24 -m 256 -d 12 -k True
Using cores=24 memory=256GB disks=12 hbase=True
Profile: cores=24 memory=196608MB reserved=64GB usableMem=192GB disks=12
What I found per Hortonworks site, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/determin....
Memory per container is 4 times the recommended value and the reserved system memory is set to twice the value per recommended value.
Can someone confirm if I change the reserved system memory and the container in the yarn-site.xml file do I need to change any additional values? Hadoop cluster consists of 24 cores per node, 256 GB RAM (per node) and 12 disks (per node)
You can assign more memory by editing the conf/mapred-site.xml file and adding the property:
This will start the hadoop JVMs with more heap space.
If you are still facing same issue please refer this link.