@BrianChan
Cluster Average Utilization Calculation: The cluster average utilization during HDFS rebalancing is typically calculated based on the configured capacity of the cluster. The configured capacity represents the total storage capacity allocated to the HDFS cluster as defined in the cluster's configuration settings.
Individual Utilization Calculation: Individual utilization during HDFS rebalancing is usually calculated based on the sum of DFS used and remaining space for each datanode. This calculation provides an accurate representation of how much storage is currently being utilized on each datanode and how much space is available for additional data storage.
Difference in File Moving Size: The difference between the initially reported file moving size and the actual file moving size in the balancer log can occur due to various factors. These may include changes in data distribution across datanodes during the rebalancing process, optimizations performed by the balancer algorithm, or adjustments made based on real-time cluster conditions and performance considerations.
Exceeding DataNode Balancing Bandwidth: While the datanode balancing bandwidth is configured to limit the amount of data transferred between datanodes per second during HDFS rebalancing, it's possible for the actual bandwidth consumption to exceed this limit under certain circumstances. Factors such as network congestion, variations in data transfer rates, or optimizations performed by the balancer algorithm can contribute to bandwidth consumption exceeding the configured limit.
Regards,
Chethan YM