Support Questions
Find answers, ask questions, and share your expertise

Cause of uneven Data distribution incase of heterogenous nodes in cluster

Cause of uneven Data distribution incase of heterogenous nodes in cluster

New Contributor

I have a 6 datanode ( + 2 NN ) cluster running CDH 5.9.

The cluster datanode specs are as follows

4 DN - 180 GB RAM, 30TB Disk Space Each

2 DN - 55 GB RAM, 15TB Disk Space Each

 

4 nodes already existed and we added the other 2 later. While adding we decided to keep the same config for the roles as we did for the older 4. This obviously overcommitted memory. Overtime we saw DFS disk space on 2 nodes being used more ( 80% ) as compared to ( 65% ) on the 4 nodes.

 

So i want to understand if this is due to overcommit which is causing these two DN to be used more or is there anything else I need to tune.