Current our NameNodes has about 160GB jvm heap (Our NameNodes work with "-Xmx160g -Xms160", actual metadata size is about 120GB.).
Can we increase it to about for example 300GB or 500GB?
I'm worry about that sometime NameNode or Java might say like "I can not handle this much memory, sorry and good bye..."
HWX docs suggests some recommendations for the NameNode Heap based on the number of files that it can handle: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/config...
NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters in which the number of blocks is very close to the number of files.
However if you have enough free RAM available then you can set larger Heap (Xmx) for the Java Process like NameNode, The JVM should not complain if the heap size is set to a larger value (until we have enough RAM available)
Jay SenSharma Thank you for replying.
Our NameNode manages over 300 Millions hdfs files and dirs. So, the table above says, we need more than "104473m" heap size for "150-200 Millions".
And If number of hdfs files and dirs become 600 Millions, we would need about 300 GB (100GB x 3) memory for NameNode.
> The JVM should not complain if the heap size is set to a larger value (until we have enough RAM available)
So, you mean, we can set 300GB for java heap size, and the NameNode or the JVM for the NameNode should not complain if the node has enough physical memory, for example, 500 GB, right?
I think the largest concern is not heap itself, but how heap effects garbage collection. Depending on how large your heap is and how many objects need to be scanned to see if they should be purge or not can increase the time of the Garbage Collection. While the GC is taking place the NameNode is paused and all DFS operations would be paused until the GC is completed. If the GC takes to long a failover can occur, its not uncommon on some clusters to have to increase the failover timeout simply because the garbage collection is taking longer then the timeout.
About full GC, we use CMS GC for these NameNodes, it takes about 2 mins in total, but the STW GC ("Initial Mark" and "Final Remark") is less than a few second in total. So, we don't worry about the GC issue for now :)
Yes, we are trying to use HDFS federation with nn3 and nn4 for second nameservice now to reduce heap usage of the first set of namenodes (nn1 and nn2) for the first nameservice.
(Support HDFS Federation, please...)