Support Questions

Meepoljd · ‎01-16-2024

Hi, I have a Hadoop 3.1.1 cluster and recently I found that the NameNode heap memory usage in the cluster is very high. I saw 629973631 file objects in the cluster through WebUi, so according to my calculations, it should occupy no more than 90GB of memory, right? Why is the current memory usage consistently above 140GB? Is this related to me enabling erasure codes?

mszurap · ‎01-17-2024

Hi @Meepoljd ,

The file and block metadata consumes the NameNode heap. Can you share how did your calculation?

Per our docs:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-sizing-namenode-hea...

the file count should kept below 300m files. Also the same page suggests that approximately 150 bytes are needed for each namespace object, I assume you did your calculation based on that. The real NN heap consumption varies with the path lengths, ACL counts, replication factors, snapshots, operational load, etc. As such in our other page

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-examples-namenode-h...

we suggest to allocate rather a bigger heap size, 1 GB heap for 1 million blocks, which would be ~320 GB in your case.

Hope this helps,

Best regards, Miklos

View solution in original post

mszurap · ‎01-17-2024

Hi @Meepoljd ,

The file and block metadata consumes the NameNode heap. Can you share how did your calculation?

Per our docs:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-sizing-namenode-hea...

the file count should kept below 300m files. Also the same page suggests that approximately 150 bytes are needed for each namespace object, I assume you did your calculation based on that. The real NN heap consumption varies with the path lengths, ACL counts, replication factors, snapshots, operational load, etc. As such in our other page

https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-examples-namenode-h...

we suggest to allocate rather a bigger heap size, 1 GB heap for 1 million blocks, which would be ~320 GB in your case.

Hope this helps,

Best regards, Miklos

Meepoljd · ‎01-18-2024

Thank you very much for your answer. I will try to adjust the allocation of heap memory. In addition, I would like to ask how the conclusion of using 1GB of memory for every 1 million blocks was drawn? Or is there a more precise calculation method that can lead to such a conclusion?

mszurap · ‎01-18-2024

As far as I know this is more of an empirical best practice. As mentioned it cannot be exactly calculated since there are some variable factors (filename / path lengths, acl counts, etc) which change from environment to environment.

Meepoljd · ‎01-18-2024

Thank you very much for your answer. I will try to adjust the allocation of heap memory. In addition, I would like to ask how the conclusion of using 1GB of memory for every 1 million blocks was drawn? Or is there a more precise calculation method that can lead to such a conclusion?

Cloudera Community

Support Questions

The heap memory usage of NameNode is much higher than expected

Faster Auto-scaling for Higher Computing Requireme...

NameNode Heap Usage (Daily) Alert

Spark Memory Management

Dissecting the NiFi "connection"... Heap usage and...

Memory usage of state in Spark Structured Streamin...

How to set the NameNode Heap memory in ambari?

Ambari Alert "NameNode Heap Usage (Daily) "

NIFI - The JVM heap memory on the primary node gra...

Namenode Heap Size usage is > 80% on daily basis

How to monitor yarn applications actual memory usa...