@aasf, I think it has been consollodated on this page: https://www.cloudera.com/content/dam/www/marketing/resources/datasheets/cloudera-enterprise-datasheet.pdf.landing.html#cmfeig_topic_5_1 See page 2 of the embedded doc. If you have any other questions, we are happy to assist.
... View more
Harsh, Can you clarify something? The Namenode reports X number of blocks. The 1 GB per 1 million blocks is based off of this and detailed calculations accounts for all three replicas. The DN block report tracks all bocks. An example is a cluster that has 40 DNs. The DNs are reporting close to 6 million blocks per DN. The NN reports only 80 blocks (80 * 3 = 6 * 40). So in keeping with DN heap calc being similar to the NN it would then be (DN blocks report / 3) / 1000000. With that said, what I am seeing doesn't jive with this calc. These same DN have not had their heap usage increase over time as the block count has gone up. I would expect somewhere in the 6 GB range if the same calc is used and around 2 GB with the modified. It flucutates between 3 - 5 GB. The assumption that makes both caclulations wrong is that all blocks are being served up by the DNs at the same time. This is not true. The handler count and replication streams limit the amount of data transfered to and from the DNs. There isn't a calculation based on them. Just monitoring and increasing the heap when OOM occur or seem likely based on heap usage. Matt Bigelow
... View more