Reply
New Contributor
Posts: 3
Registered: ‎11-09-2015

Datanode Heapsize Computation

Hi Everyone!

 

I'm having problems finding what is the right DN heapsize to use.

Before it was set to 1GB and one of the Dev's recommend to adjust to 32GB.

But still having issues.

 

Is there a way to compute how much DN heapsize to use?

Namenode heapsize has its simple formula.. 

 

Thanks you! Any help is greatly appreciated!

Posts: 1,695
Kudos: 341
Solutions: 264
Registered: ‎07-31-2013

Re: Datanode Heapsize Computation

> "But still having issues."

What issues, specifically?

A DN heap is hardly ever needed to be raised, especially on recent CDH5
versions. The same rule of NN can apply to DN (for its held number of total
blocks).

Unless you're seeing long GC pauses or OOME crashes on them, you shouldn't
require to raise the DN heap size.

New Contributor
Posts: 2
Registered: ‎08-06-2015

Re: Datanode Heapsize Computation

This doesn't address the question though.

"Is there a way to compute how much DN heapsize to use?
Namenode heapsize has its simple formula.. "

I understand the DN heap should not need to be increased but how do I determine the optimal, base setting?
Posts: 1,695
Kudos: 341
Solutions: 264
Registered: ‎07-31-2013

Re: Datanode Heapsize Computation

You can follow a similar baseline of 1 GB heap per million blocks, cause the in-memory cost is similar for block metadata.
Highlighted
Posts: 642
Topics: 3
Kudos: 111
Solutions: 67
Registered: ‎08-16-2016

Re: Datanode Heapsize Computation

Harsh,

 

Can you clarify something? The Namenode reports X number of blocks. The 1 GB per 1 million blocks is based off of this and detailed calculations accounts for all three replicas. The DN block report tracks all bocks. An example is a cluster that has 40 DNs. The DNs are reporting close to 6 million blocks per DN. The NN reports only 80 blocks (80 * 3 = 6 * 40). So in keeping with DN heap calc being similar to the NN it would then be (DN blocks report / 3) / 1000000.

 

With that said, what I am seeing doesn't jive with this calc. These same DN have not had their heap usage increase over time as the block count has gone up. I would expect somewhere in the 6 GB range if the same calc is used and around 2 GB with the modified. It flucutates between 3 - 5 GB. The assumption that makes both caclulations wrong is that all blocks are being served up by the DNs at the same time. This is not true. The handler count and replication streams limit the amount of data transfered to and from the DNs. There isn't a calculation based on them. Just monitoring and increasing the heap when OOM occur or seem likely based on heap usage.

 

Matt Bigelow

Announcements