Support Questions
Find answers, ask questions, and share your expertise

Replicas of all newly written blocks are placed into newly added nodes in the default rack on a rack aware HDFS cluster

New Contributor

I recently added several new datanodes to a rack aware HDFS cluster with replication factor 2 (v 3.0.0).

 

All other pre-existing nodes were already assigned to a rack (with names such as /cabinet1/rack1, /cabinet1/rack2, /cabinet2/rack1, /cabinet2/rack2, etc. with a total number of 10+ different racks). These newly added nodes are not assigned to any rack, so they are in the default rack that is /default/default.

 

We ran the balancer for some time to fill the new nodes with existing data and make the storage used equal on all nodes. After a while, as the new data has been written to the HDFS, we noticed a serious skewness in the cluster where the storage capacities being used by datanodes in newly added servers are nearly 10-15% ahead of those of older nodes and the diffence has been continued to increase.

 

Having been checked the namenode logs, we observed that more than 60% of the second replicas are placed in new nodes as shown below,

INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_number1_number2, replicas=<not_yet_assigned_host_in_default_rack>:9866, <host_assigned_to_a_rack>:9866 for <file>

lines with block allocation on two different nodes each of which assigned to non-default different racks are with a NODE_TOO_BUSY log as shown below,

INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_number1_number2, replicas=<host_assigned_to_rack1>:9866, <host_assigned_to_rack_2>:9866 for <file>
INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManagementPolicy: Not enough replicas was chosen. Reason:{NODE_TOO_BUSY=5}

Yet the blk_number1_number2 of <file> seem to be allocated in host_assigned_to_rack1 and host_assigned_to_rack_2 when queried with

hdfs fsck <file> -files -blocks -replicaDetails

 

dfs.block.replicator.classname is set to default that is org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault

The DFSClients that are writing to HDFS are all NONMAPREDUCE and are not local to datanodes.

Currently all racks have 3 nodes, only the default rack has 5 nodes. This sort of skewness like 10-15 percent should not occur if one of the racks have 2 more nodes (actually having 66% more nodes than others).

 

Is there a solution or an explanation to this problem? I could not find any documentation pointing out that block placement occurs towards or in favor of default rack if there are nodes inside it.

Any help would be appreciated.

0 REPLIES 0