Created 07-15-2016 03:59 AM
how much amount to be set heap memory size? It means heap memory size is depend on which factor?
Created 07-15-2016 04:13 AM
NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The settings in the referenced table below should work for typical Hadoop clusters where the number of blocks is very close to the number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.
Created 07-15-2016 04:13 AM
NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The settings in the referenced table below should work for typical Hadoop clusters where the number of blocks is very close to the number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.
Created 07-15-2016 04:35 AM
@Bandaru: Thanks but how to set in ambari cluter ?
Created 07-15-2016 04:55 AM
HADOOP_HEAPSIZE sets the JVM heap size for all Hadoop project servers such as HDFS, YARN, and MapReduce. HADOOP_HEAPSIZE is an integer passed to the JVM as the maximum memory (Xmx) argument. For example:
HADOOP_HEAPSIZE=256
HADOOP_NAMENODE_OPTS is specific to the NameNode and sets all JVM flags, which must be specified.HADOOP_NAMENODE_OPTS overrides the HADOOP_HEAPSIZE Xmx value for the NameNode. For example:
HADOOP_NAMENODE_OPTS=SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=50m -XX:MaxNewSize=100m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms250m -Xmx250m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT"
Both HADOOP_NAMENODE_OPTS and HADOOP_HEAPSIZE are stored in /etc/hadoop/conf/hadoop-env.sh.
Created 07-15-2016 05:15 AM
Thanks Sir Bandaru
Created 07-15-2016 11:06 AM
There is a control in ambari under HDFS right at the top for the memory of the Namenode. You should not set the heap size for all components since most need much less memory than the namenode.
For Namenode a good rule of thumb is 1GB for 100TB of data in HDFS ( plus a couple GB base so 4-8 min ) but it needs to be tuned based on workload ( if you suspect your memory settings to be insufficient you can look at the memory and GC behaviour of your JVM )
Created 07-16-2016 03:07 AM
@Benjamin Thanks