I'm having problems finding what is the right DN heapsize to use.
Before it was set to 1GB and one of the Dev's recommend to adjust to 32GB.
But still having issues.
Is there a way to compute how much DN heapsize to use?
Namenode heapsize has its simple formula..
Thanks you! Any help is greatly appreciated!
Can you clarify something? The Namenode reports X number of blocks. The 1 GB per 1 million blocks is based off of this and detailed calculations accounts for all three replicas. The DN block report tracks all bocks. An example is a cluster that has 40 DNs. The DNs are reporting close to 6 million blocks per DN. The NN reports only 80 blocks (80 * 3 = 6 * 40). So in keeping with DN heap calc being similar to the NN it would then be (DN blocks report / 3) / 1000000.
With that said, what I am seeing doesn't jive with this calc. These same DN have not had their heap usage increase over time as the block count has gone up. I would expect somewhere in the 6 GB range if the same calc is used and around 2 GB with the modified. It flucutates between 3 - 5 GB. The assumption that makes both caclulations wrong is that all blocks are being served up by the DNs at the same time. This is not true. The handler count and replication streams limit the amount of data transfered to and from the DNs. There isn't a calculation based on them. Just monitoring and increasing the heap when OOM occur or seem likely based on heap usage.