02-02-2018 02:16 AM
I would like to clarify my understanding of how rebalancing works.
I have cluster composed of two node types. First half of data nodes has twice as much of disk capacity as the second half. At the moment data are distributed quite uniformly across the cluster with respect to data volume. This causes nodes with less disk space to run over 85% of the disk usage while the rest of larger nodes are at about 50% of the disk usage.
Do I undestand correctly that, when I turn on rebalacing and set the HDFS Rebalancing Threshold to 10.0 (10%) the cluster will rebalance with respect to relative disk usage on each data node and the rebalancing will result in something like 65% disk usage on all nodes regardless physical disk capacity of each node?