Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

avatar
Expert Contributor

I need to remove 1 faulty data node from the cluster and want to shut it down permanently to get all the data replicated on other data nodes. so will it auto start the re-balance if i stop the faulty data node?also what is the best way to do it?

@Jay Kumar SenSharma

5 REPLIES 5

avatar
Master Mentor

@hardik desai

Typically with a replication factor of 3, your fault data node can be removed with no impact as 2 other copies should be available. Just to understand your setup how many data nodes are we talking of here?

Running the rebalancer will do the job but with 2 TB it should take a while depending on your data center bandwidth.

Rebalancing HDFS

HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. To initiate a balancing process, follow these steps:

In Ambari Web, browse to Services > HDFS > Summary.

Click Service Actions, and then click Rebalance HDFS.

Enter the Balance Threshold value as a percentage of disk capacity.

Click Start.

It's recommended running the balancer during times when the cluster load is low else you'll notice high NN RPC, when balancer is executing.

Here is a document that can help you tune the hdfs Balancer

avatar
Expert Contributor

@Geoffrey Shelton Okot thanks for the reply.

we have a setup of 6 data nodes in this cluster with 2TB data on each node.

also while shutting down the datanode, is re-balancing start auto or we have to initiate the same?

Thanks,

avatar
Master Mentor

@hardik desai

When you run the shutdown command the balancer kicks off, it takes up network resources. You want to make sure you are not doing it when you have heavy load as those jobs will be affected. There are also checks against moving blocks as you may have current jobs using them.

Decommissioning is a more elegant way of triggering rebalancing and zero data block loss your undesired node will be decommissioned once all the blocks have been replicated.

This is the more used option.

avatar
Master Mentor

@hardik desai

Any updates? If my response answered your question then can you take time and "accept" it so others HCC members could reference it

avatar
Expert Contributor

@Geoffrey Shelton Okot the activiy got finished & balancer started as soon as i stopped my data node but it took so much time to finish the activity.

Is there any way by which i can perform this activity more faster?

Thanks,