Support Questions

ansarifah · ‎07-15-2016

how much amount to be set heap memory size? It means heap memory size is depend on which factor?

bandarusridhar1 · ‎07-15-2016

NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The settings in the referenced table below should work for typical Hadoop clusters where the number of blocks is very close to the number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-8095...

View solution in original post

bandarusridhar1 · ‎07-15-2016

@ANSARI FAHEEM AHMED

NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The settings in the referenced table below should work for typical Hadoop clusters where the number of blocks is very close to the number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-8095...

ansarifah · ‎07-15-2016

@Bandaru: Thanks but how to set in ambari cluter ?

bandarusridhar1 · ‎07-15-2016

@ANSARI FAHEEM AHMED

HADOOP_HEAPSIZE sets the JVM heap size for all Hadoop project servers such as HDFS, YARN, and MapReduce. HADOOP_HEAPSIZE is an integer passed to the JVM as the maximum memory (Xmx) argument. For example:

HADOOP_HEAPSIZE=256

HADOOP_NAMENODE_OPTS is specific to the NameNode and sets all JVM flags, which must be specified.HADOOP_NAMENODE_OPTS overrides the HADOOP_HEAPSIZE Xmx value for the NameNode. For example:

HADOOP_NAMENODE_OPTS=SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=50m -XX:MaxNewSize=100m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms250m -Xmx250m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT"

Both HADOOP_NAMENODE_OPTS and HADOOP_HEAPSIZE are stored in /etc/hadoop/conf/hadoop-env.sh.

ansarifah · ‎07-15-2016

Thanks Sir Bandaru

bleonhardi · ‎07-15-2016

There is a control in ambari under HDFS right at the top for the memory of the Namenode. You should not set the heap size for all components since most need much less memory than the namenode.

For Namenode a good rule of thumb is 1GB for 100TB of data in HDFS ( plus a couple GB base so 4-8 min ) but it needs to be tuned based on workload ( if you suspect your memory settings to be insufficient you can look at the memory and GC behaviour of your JVM )

ansarifah · ‎07-16-2016

@Benjamin Thanks

Cloudera Community

Support Questions

How to set the NameNode Heap memory in ambari?