Support Questions

Find answers, ask questions, and share your expertise

Datanode Heapsize Computation

Rising Star

Hi Team,

I am using HDP 2.4.2 and I have 39 Datanodes in a cluster. Initially, there was 1GB heapsize of DN by default. Then it started to send warning alerts from every datanode even though there was no ingestion/job running. So, I increased the DN heapsize to 2GB but still it is sending me alerts consuming 60-70% heapsize and sometimes 80-90% even though cluster is idle. Is there any calculation/formula how much heapsize should I provide in Datanodes?? Please help.





@Rahul Buragohain If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently) then you should look at the DataNode GC log to findout if the GC was happening properly or not?

You might be hitting the following issue if your GC tuning is good.

However i would suggest you to try with the following Datanode JVM options to see improvements.

-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC


Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .

View solution in original post



@Rahul Buragohain If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently) then you should look at the DataNode GC log to findout if the GC was happening properly or not?

You might be hitting the following issue if your GC tuning is good.

However i would suggest you to try with the following Datanode JVM options to see improvements.

-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC


Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .

Rising Star


Thanks a lot. That solved my issue and I am not getting DN heapsize alerts anymore.

Expert Contributor

@jss @Rahul Buragohain

I have the same issue with my HDP 2.4.2... where exactly do i change these parameters??

I see them in hadoop-env template with:

SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}},DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}},DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"

If this is the file, should i just add the mentioned parameters in the HADOOP_DATANODE_OPTS ??

and do i need to restart the hdfs service?
