Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

The heap memory usage of NameNode is much higher than expected

avatar
Rising Star

Hi, I have a Hadoop 3.1.1 cluster and recently I found that the NameNode heap memory usage in the cluster is very high. I saw 629973631 file objects in the cluster through WebUi, so according to my calculations, it should occupy no more than 90GB of memory, right? Why is the current memory usage consistently above 140GB? Is this related to me enabling erasure codes?

Meepoljd_0-1705395074316.png

 

1 ACCEPTED SOLUTION

avatar

Hi @Meepoljd ,

The file and block metadata consumes the NameNode heap. Can you share how did your calculation?

Per our docs:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-sizing-namenode-hea... 

the file count should kept below 300m files. Also the same page suggests that approximately 150 bytes are needed for each namespace object, I assume you did your calculation based on that. The real NN heap consumption varies with the path lengths, ACL counts, replication factors, snapshots, operational load, etc. As such in our other page

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-examples-namenode-h... 

we suggest to allocate rather a bigger heap size, 1 GB heap for 1 million blocks, which would be ~320 GB in your case.

Hope this helps,

Best regards, Miklos

View solution in original post

4 REPLIES 4

avatar

Hi @Meepoljd ,

The file and block metadata consumes the NameNode heap. Can you share how did your calculation?

Per our docs:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-sizing-namenode-hea... 

the file count should kept below 300m files. Also the same page suggests that approximately 150 bytes are needed for each namespace object, I assume you did your calculation based on that. The real NN heap consumption varies with the path lengths, ACL counts, replication factors, snapshots, operational load, etc. As such in our other page

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-examples-namenode-h... 

we suggest to allocate rather a bigger heap size, 1 GB heap for 1 million blocks, which would be ~320 GB in your case.

Hope this helps,

Best regards, Miklos

avatar
Rising Star

Thank you very much for your answer. I will try to adjust the allocation of heap memory. In addition, I would like to ask how the conclusion of using 1GB of memory for every 1 million blocks was drawn? Or is there a more precise calculation method that can lead to such a conclusion?

avatar

As far as I know this is more of an empirical best practice. As mentioned it cannot be exactly calculated since there are some variable factors (filename / path lengths, acl counts, etc) which change from environment to environment.

avatar
Rising Star

Thank you very much for your answer. I will try to adjust the allocation of heap memory. In addition, I would like to ask how the conclusion of using 1GB of memory for every 1 million blocks was drawn? Or is there a more precise calculation method that can lead to such a conclusion?