Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

rack awarness question

Solved Go to solution

rack awarness question

Explorer

Hi All,

 

We have plan to change rack awarness our production server. At the moment our production server rack contain 14 hadoop node using replication 2 . we want to move those rack to another data center room and decrease from 14  to 10 hadoop node per rack, but we worry about distribution and replication existing data. Do we need restart datanode service and rebalance the data after change configuration and Is there any guidance or documentation related changing rack awarness configuration?

 

 

Regards,

i9um0

1 ACCEPTED SOLUTION

Accepted Solutions

Re: rack awarness question

Contributor

Hi,

 

The "clean" was is to decommission the node to make sure nothing is going into this node anymore and not any block is at risk.

 

However, if what you want to have is 2 blocks only, then increasing the replication to 3, waiting for all the blocks to be fully replicated and then just stopping a datanode before moving it will allow you to always have the minimum of 2 blocks you are looking for. And this will be way faster. But not as clean as doing the decommission.

 

Also, if you move the datanode fast enought, most of the blocks on it will simply be re-used when it will be re-connected.

 

JM

3 REPLIES 3

Re: rack awarness question

Contributor

Hi,

 

So, few things.

 

First thing, if you allow me, is to challenge the replication factor. Is there any logic behing it? If you loose any drive, then you are at risk to have datalost in case you loose any other drive on another node. That's why usually it is recommended to keep it at 3.

 

Now, regarding the balancing.

 

HDFS balancer will use the same policy as the placement when creating new blocks. The balancer threashold is based on disk space. So if all the nodes are still perfectly balanced, then balancer will exit, even if some blocks are all in the same rack.

 

When balancing a block, the balancer will make sure the block is going on a node which doesn't reduce the number of racks this block is on. However, that still allow a block to move from one node to another one withing the same rack instead of spreading it across multiple racks.

 

From the javadoc of the code:

Decide if the block is a good candidate to be moved from source to target. A block is a good candidate if 1. the block is not in the process of being moved/has not been moved; 2. the block does not have a replica on the target; 3. doing the move does not reduce the number of racks that the block has.

 

So even if you restart your datanodes and rebalance your cluster, you might still have some blocks into the same rack.

 

I don't see any quick and easy solution, and having a replication factor of 2 add risks to the operation. One solution is to decommission the extra nodes per racks before moving them and assigning them to a new rack. That way, those new nodes will be empty and the balancer will assign them some blocks. But this is a long process, will require a lot of network transfert and time.

 

HTH,

 

JMS

 

 

 

 

Highlighted

Re: rack awarness question

Explorer

Hi,

 

if we increased replication from 2 to 3, which is better way to dettach the node from cluster using stop service or decommission the node ?

 

 

regards,

i9um0

Re: rack awarness question

Contributor

Hi,

 

The "clean" was is to decommission the node to make sure nothing is going into this node anymore and not any block is at risk.

 

However, if what you want to have is 2 blocks only, then increasing the replication to 3, waiting for all the blocks to be fully replicated and then just stopping a datanode before moving it will allow you to always have the minimum of 2 blocks you are looking for. And this will be way faster. But not as clean as doing the decommission.

 

Also, if you move the datanode fast enought, most of the blocks on it will simply be re-used when it will be re-connected.

 

JM