Community Articles
Find and share helpful community-sourced technical articles
Labels (1)

PROBLEM: Balancer fails in few minutes without any block movement.

SYMPTOMS: Following are the messages balancer exits with:-

16/11/22 07:08:29 DEBUG ipc.Client: IPC Client (280134559) connection to from hdfs-EST@HADOOP.XXX.CORP.EXAMPLE.COM got value #1193
16/11/22 07:08:29 DEBUG ipc.ProtobufRpcEngine: Call: getBlocks took 2486ms
No block has been moved for 5 iterations. Exiting...Nov 22, 2016 7:08:29 AM           
4                  0 B            35.86 TB             200 GB

ROOT CAUSE: The rack distribution looked like below:-

/default-rack : 91 
/Example1 : 18 
/Example2 : 2 

The 100% utilized nodes which we were trying to balance to create space were those 20 nodes registered with racks /Example1 and /Example2.Thus based on following rack awareness rules in balancer (rule#3 for this issue) for block placement, it was not at all possible for even a single block to move compromising fault tolerance.

  /**   * Decide if the block is a good candidate to be moved from source to target.   
* A block is a good candidate if   
* 1. the block is not in the process of being moved/has not been moved;   
* 2. the block does not have a replica on the target;   
* 3. doing the move does not reduce the number of racks that the block has   */


SOLUTION: Distribute nodes evenly across all racks.If this is not possible add additional storage to respective nodes OR add new datanodes to the respective racks.

0 Kudos
Expert Contributor