HDFS Balancer is a tool for balancing the data across the storage devices of a HDFS cluster. The Balancer was originally designed to run slowly so that the balancing activities do not affect the normal cluster activities and the running jobs. We have received feedback from the HDFS community that it is also desirable if Balancer can be configured to run faster. The use cases are listed below.
Free up the spaces from some nearly full datanodes.
Move data to some newly added datanodes in order to utilize the new machines.
Run Balancer when the cluster load is low or in a maintenance window, instead of running it as a background daemon.
We have changed Balancer for addressing these new use cases. After the changes, Balancer is able to run 100x faster while it still can be configured to run slowly as before. In one of our tests, we were able to bump the performance from a few gigabytes per minute to a terabyte per minute.
In addition, we added two new features -- source datanodes and block pinning. Users can specify the source datanodes so that they can free up the spaces in particular datanodes using Balancer. A block distribution aware user application can pin its block replicas to particular datanodes so that the pinned replicas will not be moved for cluster balancing.
Why the data stored in HDFS is imbalanced?
There are three major reasons.
1. Adding Datanodes
When new datanodes are added to a cluster, newly created blocks will be written to these datanodes from time to time. However, the existing blocks will not be moved to them without using Balancer.
2. Client Behavior
In some cases, a client application may not write data uniformly across the datanode machines. A client application may be skewed in writing data. It may always write to some particular machines but not the other machines. HBase is an example of such application. In some other cases, the client application is not skewed by design such as MapReduce/YARN jobs. However, the data is skewed so that some of the job tasks write significantly more data than the other tasks. When a Datanode receives the data directly from the client, it stores a copy to its local storage for preserving data locality. The datanodes receiving more data usually have a higher storage utilization.
3. Block Allocation in HDFS
HDFS uses a constraint satisfaction algorithm to allocate file blocks. Once the constraints are satisfied, HDFS allocates a block by randomly selecting a storage device from the candidate set uniformly. For large clusters, the blocks are essentially allocated randomly in a uniform distribution, provided that the client applications write data to HDFS uniformly across the datanode machines. Note that uniform random allocation may not result in a uniform data distribution because of the randomness. It is usually not a problem when the cluster has sufficient space but the problem becomes serious when the cluster is nearly full.
In the next article, we will explain the usage of the original configurations/CLI
options of the Balancer and as well the new configurations/CLI options
added by the recent enhancement.