Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Datanode Heapsize Computation

Solved Go to solution
Highlighted

Datanode Heapsize Computation

Contributor

Hi Team,

I am using HDP 2.4.2 and I have 39 Datanodes in a cluster. Initially, there was 1GB heapsize of DN by default. Then it started to send warning alerts from every datanode even though there was no ingestion/job running. So, I increased the DN heapsize to 2GB but still it is sending me alerts consuming 60-70% heapsize and sometimes 80-90% even though cluster is idle. Is there any calculation/formula how much heapsize should I provide in Datanodes?? Please help.

Thanks,

Rahul

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Datanode Heapsize Computation

@Rahul Buragohain If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently) then you should look at the DataNode GC log to findout if the GC was happening properly or not?

You might be hitting the following issue if your GC tuning is good. https://issues.apache.org/jira/browse/HDFS-11047

However i would suggest you to try with the following Datanode JVM options to see improvements.

-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC

.

Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-8095...

View solution in original post

3 REPLIES 3
Highlighted

Re: Datanode Heapsize Computation

@Rahul Buragohain If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently) then you should look at the DataNode GC log to findout if the GC was happening properly or not?

You might be hitting the following issue if your GC tuning is good. https://issues.apache.org/jira/browse/HDFS-11047

However i would suggest you to try with the following Datanode JVM options to see improvements.

-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC

.

Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-8095...

View solution in original post

Re: Datanode Heapsize Computation

Contributor

@jss

Thanks a lot. That solved my issue and I am not getting DN heapsize alerts anymore.

Highlighted

Re: Datanode Heapsize Computation

Expert Contributor

@jss @Rahul Buragohain

I have the same issue with my HDP 2.4.2... where exactly do i change these parameters??

I see them in hadoop-env template with:

SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"

If this is the file, should i just add the mentioned parameters in the HADOOP_DATANODE_OPTS ??

and do i need to restart the hdfs service?

Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here