Support Questions
Find answers, ask questions, and share your expertise

Cause of uneven Data distribution incase of heterogenous nodes in cluster

New Contributor

I have a 6 datanode ( + 2 NN ) cluster running CDH 5.9.

The cluster datanode specs are as follows

4 DN - 180 GB RAM, 30TB Disk Space Each

2 DN - 55 GB RAM, 15TB Disk Space Each


4 nodes already existed and we added the other 2 later. While adding we decided to keep the same config for the roles as we did for the older 4. This obviously overcommitted memory. Overtime we saw DFS disk space on 2 nodes being used more ( 80% ) as compared to ( 65% ) on the 4 nodes.


So i want to understand if this is due to overcommit which is causing these two DN to be used more or is there anything else I need to tune.