Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

PROBLEM: Balancer fails in few minutes without any block movement.

SYMPTOMS: Following are the messages balancer exits with:-

16/11/22 07:08:29 DEBUG ipc.Client: IPC Client (280134559) connection to ma2-gbit-lnn51.corp.apple.com/10.184.67.21:8020 from hdfs-BD_TEST2@HADOOP.GCSKDC.CORP.APPLE.COM got value #1193
16/11/22 07:08:29 DEBUG ipc.ProtobufRpcEngine: Call: getBlocks took 2486ms
No block has been moved for 5 iterations. Exiting...Nov 22, 2016 7:08:29 AM           
4                  0 B            35.86 TB             200 GB

ROOT CAUSE: The rack distribution looked like below:-

/default-rack : 91 
/Example1 : 18 
/Example2 : 2 

The 100% utilized nodes which we were trying to balance to create space were those 20 nodes registered with racks /Example1 and /Example2.Thus based on following rack awareness rules in balancer (rule#3 for this issue) for block placement, it was not at all possible for even a single block to move compromising fault tolerance.

  /**   * Decide if the block is a good candidate to be moved from source to target.   
* A block is a good candidate if   
* 1. the block is not in the process of being moved/has not been moved;   
* 2. the block does not have a replica on the target;   
* 3. doing the move does not reduce the number of racks that the block has   */

SOLUTION: Distribute nodes evenly across all racks.If this is not possible add additional storage to respective nodes OR add new datanodes to the respective racks.

618 Views
0 Kudos
Comments
Expert Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎12-23-2016 11:24 AM
Updated by:
 
Contributors
Top Kudoed Authors