There are lot of articles for NameNode heap calculation, but none on DataNode.
1. How to calculate the DataNode heap size?
2. How to calculate the object size of each Object in the DataNode Heap?
3. What does the Metadata of the DataNode heap contains? It cannot be similar to NameNode (as it does not have replication details etc. ), also, it should have metadata for checksum stored etc, so how does metadata of DataNode looks like. How is it different from NameNode Metadata?
Great question and unfortunately, I don't think there is a well agreed upon formula/calculator out there as "it depends" is so often the rule. Some considerations are that the datanode doesn't really know about the directory structure; it just stores (and copies, deletes, etc) blocks as directed by the datanode (often indirectly since clients write actual blocks). Additionally, the checksums at the block level are actually stored on disk alongside the files for the data contained in a given block.
It looks like there's some good info in the following HCC Q's that might be of help to you.