Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Datanode Heapsize Computation


Datanode Heapsize Computation


Hi Everyone!


I'm having problems finding what is the right DN heapsize to use.

Before it was set to 1GB and one of the Dev's recommend to adjust to 32GB.

But still having issues.


Is there a way to compute how much DN heapsize to use?

Namenode heapsize has its simple formula.. 


Thanks you! Any help is greatly appreciated!


Re: Datanode Heapsize Computation

Master Guru
> "But still having issues."

What issues, specifically?

A DN heap is hardly ever needed to be raised, especially on recent CDH5
versions. The same rule of NN can apply to DN (for its held number of total

Unless you're seeing long GC pauses or OOME crashes on them, you shouldn't
require to raise the DN heap size.


Re: Datanode Heapsize Computation

New Contributor
This doesn't address the question though.

"Is there a way to compute how much DN heapsize to use?
Namenode heapsize has its simple formula.. "

I understand the DN heap should not need to be increased but how do I determine the optimal, base setting?

Re: Datanode Heapsize Computation

Master Guru
You can follow a similar baseline of 1 GB heap per million blocks, cause the in-memory cost is similar for block metadata.

Re: Datanode Heapsize Computation




Can you clarify something? The Namenode reports X number of blocks. The 1 GB per 1 million blocks is based off of this and detailed calculations accounts for all three replicas. The DN block report tracks all bocks. An example is a cluster that has 40 DNs. The DNs are reporting close to 6 million blocks per DN. The NN reports only 80 blocks (80 * 3 = 6 * 40). So in keeping with DN heap calc being similar to the NN it would then be (DN blocks report / 3) / 1000000.


With that said, what I am seeing doesn't jive with this calc. These same DN have not had their heap usage increase over time as the block count has gone up. I would expect somewhere in the 6 GB range if the same calc is used and around 2 GB with the modified. It flucutates between 3 - 5 GB. The assumption that makes both caclulations wrong is that all blocks are being served up by the DNs at the same time. This is not true. The handler count and replication streams limit the amount of data transfered to and from the DNs. There isn't a calculation based on them. Just monitoring and increasing the heap when OOM occur or seem likely based on heap usage.


Matt Bigelow

Don't have an account?
Coming from Hortonworks? Activate your account here