Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Rebalancing differently sized nodes

avatar
New Contributor

I would like to clarify my understanding of how rebalancing works.

 

I have cluster composed of two node types. First half of data nodes has twice as much of disk capacity as the second half. At the moment data are distributed quite uniformly across the cluster with respect to data volume. This causes nodes with less disk space to run over 85% of the disk usage while the rest of larger nodes are at about 50% of the disk usage.

 

Do I undestand correctly that, when I turn on rebalacing and set the HDFS Rebalancing Threshold to 10.0 (10%) the cluster will rebalance with respect to relative disk usage on each data node and the rebalancing will result in something like 65% disk usage on all nodes regardless physical disk capacity of each node?

1 ACCEPTED SOLUTION

avatar
Mentor
Yes that is precisely correct - it balances by average utilization
percentage per node rather than by average byte count.

View solution in original post

1 REPLY 1

avatar
Mentor
Yes that is precisely correct - it balances by average utilization
percentage per node rather than by average byte count.