<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Datanode Heapsize Computation in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162667#M125043</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/3578/rburagohain.html"&gt;Rahul Buragohain&lt;/A&gt;    If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently)  then you should look at the DataNode GC log to findout if the GC was happening properly or not?&lt;/P&gt;&lt;P&gt;You might be hitting the following issue if your GC tuning is good. &lt;A href="https://issues.apache.org/jira/browse/HDFS-11047" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-11047&lt;/A&gt;&lt;/P&gt;&lt;P&gt;However i would suggest you to try with the following Datanode JVM options to see improvements.&lt;/P&gt;&lt;PRE&gt;-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 03 Nov 2016 12:47:50 GMT</pubDate>
    <dc:creator>Former Member</dc:creator>
    <dc:date>2016-11-03T12:47:50Z</dc:date>
    <item>
      <title>Datanode Heapsize Computation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162666#M125042</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I am using HDP 2.4.2 and I have 39 Datanodes in a cluster. Initially, there was 1GB heapsize of DN by default. Then it started to send warning alerts from every datanode even though there was no ingestion/job running. So, I increased the DN heapsize to 2GB but still it is sending me alerts consuming 60-70% heapsize and sometimes 80-90% even though cluster is idle. Is there any calculation/formula how much heapsize should I provide in Datanodes?? Please help.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;</description>
      <pubDate>Thu, 03 Nov 2016 11:47:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162666#M125042</guid>
      <dc:creator>rburagohain</dc:creator>
      <dc:date>2016-11-03T11:47:07Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode Heapsize Computation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162667#M125043</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/3578/rburagohain.html"&gt;Rahul Buragohain&lt;/A&gt;    If you believe that the 2GB heap is enough for your DataNode (and it's Idle most of the time still consuming that much memory frequently)  then you should look at the DataNode GC log to findout if the GC was happening properly or not?&lt;/P&gt;&lt;P&gt;You might be hitting the following issue if your GC tuning is good. &lt;A href="https://issues.apache.org/jira/browse/HDFS-11047" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-11047&lt;/A&gt;&lt;/P&gt;&lt;P&gt;However i would suggest you to try with the following Datanode JVM options to see improvements.&lt;/P&gt;&lt;PRE&gt;-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly  -XX:+UseConcMarkSweepGC&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;Currently you only get the formula to calculate the Heap Size of a NameNode, But not for the DataNode .&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.6/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Nov 2016 12:47:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162667#M125043</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2016-11-03T12:47:50Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode Heapsize Computation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162668#M125044</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3823/joysensharma.html" nodeid="3823"&gt;@jss&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot. That solved my issue and I am not getting DN heapsize alerts anymore.&lt;/P&gt;</description>
      <pubDate>Sun, 27 Nov 2016 16:26:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162668#M125044</guid>
      <dc:creator>rburagohain</dc:creator>
      <dc:date>2016-11-27T16:26:11Z</dc:date>
    </item>
    <item>
      <title>Re: Datanode Heapsize Computation</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162669#M125045</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3823/joysensharma.html" nodeid="3823"&gt;@jss&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3578/rburagohain.html" nodeid="3578"&gt;@Rahul Buragohain&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have the same issue with my HDP 2.4.2... where exactly do i change these parameters?? &lt;/P&gt;&lt;P&gt;I see them in hadoop-env template with:&lt;/P&gt;&lt;P&gt;SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -&lt;STRONG&gt;XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -&lt;/STRONG&gt;Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT"
export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}"
export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}"&lt;/P&gt;&lt;P&gt;If this is the file, should i just add the mentioned parameters in the HADOOP_DATANODE_OPTS ??&lt;/P&gt;&lt;P&gt;and do i need to restart the hdfs service?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jan 2017 04:42:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Datanode-Heapsize-Computation/m-p/162669#M125045</guid>
      <dc:creator>pmj</dc:creator>
      <dc:date>2017-01-21T04:42:45Z</dc:date>
    </item>
  </channel>
</rss>

