Created 10-24-2018 08:10 AM
I need to remove 1 faulty data node from the cluster and want to shut it down permanently to get all the data replicated on other data nodes. so will it auto start the re-balance if i stop the faulty data node?also what is the best way to do it?
Typically with a replication factor of 3, your fault data node can be removed with no impact as 2 other copies should be available. Just to understand your setup how many data nodes are we talking of here?
Running the rebalancer will do the job but with 2 TB it should take a while depending on your data center bandwidth.
HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. To initiate a balancing process, follow these steps:
In Ambari Web, browse to Services > HDFS > Summary.
Click Service Actions, and then click Rebalance HDFS.
Enter the Balance Threshold value as a percentage of disk capacity.
It's recommended running the balancer during times when the cluster load is low else you'll notice high NN RPC, when balancer is executing.
Here is a document that can help you tune the hdfs Balancer
Created 10-24-2018 01:13 PM
@Geoffrey Shelton Okot thanks for the reply.
we have a setup of 6 data nodes in this cluster with 2TB data on each node.
also while shutting down the datanode, is re-balancing start auto or we have to initiate the same?
When you run the shutdown command the balancer kicks off, it takes up network resources. You want to make sure you are not doing it when you have heavy load as those jobs will be affected. There are also checks against moving blocks as you may have current jobs using them.
Decommissioning is a more elegant way of triggering rebalancing and zero data block loss your undesired node will be decommissioned once all the blocks have been replicated.
This is the more used option.
Created 10-29-2018 04:56 AM
@Geoffrey Shelton Okot the activiy got finished & balancer started as soon as i stopped my data node but it took so much time to finish the activity.
Is there any way by which i can perform this activity more faster?