Created on 06-10-202302:47 AM - edited on 06-12-202311:01 PM by VidyaSargur
Summary
It is always a good idea to tune your HDFS Rebalancer away from the raw default values so that the overall hardware will be utilized at an optimal level when HDFS Rebalancing activities are being performed.
Based on Block Placement Policies, we also recommend modifying how the NameNode chooses to place blocks during HDFS configuration and rebalancing testing of a heavily heterogeneous cluster.
Some key notes about performing the rebalancing activities after setting the services/disks up:
Never run both the HDFS && Kudu Rebalancers at the same time
The contention between both may cause issues
Perform the Rebalancing activities in the order of Kudu first, HDFS second
Due to Kudu being unable to track capacity utilization
Performing HDFS Rebalancing Activities
There are corresponding CLI related commands for rebalancing HDFS. We recommend that you perform these actions from within CM for visibility into the active status of the Rebalancer and to see when the action has been started and completed.