New Contributor
Posts: 2
Registered: ‎02-04-2016
Accepted Solution

Rebalancing differently sized nodes

I would like to clarify my understanding of how rebalancing works.


I have cluster composed of two node types. First half of data nodes has twice as much of disk capacity as the second half. At the moment data are distributed quite uniformly across the cluster with respect to data volume. This causes nodes with less disk space to run over 85% of the disk usage while the rest of larger nodes are at about 50% of the disk usage.


Do I undestand correctly that, when I turn on rebalacing and set the HDFS Rebalancing Threshold to 10.0 (10%) the cluster will rebalance with respect to relative disk usage on each data node and the rebalancing will result in something like 65% disk usage on all nodes regardless physical disk capacity of each node?

Posts: 1,572
Kudos: 295
Solutions: 241
Registered: ‎07-31-2013

Re: Rebalancing differently sized nodes

Yes that is precisely correct - it balances by average utilization
percentage per node rather than by average byte count.