Support Questions
Find answers, ask questions, and share your expertise

after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Contributor

I need to remove 1 faulty data node from the cluster and want to shut it down permanently to get all the data replicated on other data nodes. so will it auto start the re-balance if i stop the faulty data node?also what is the best way to do it?

@Jay Kumar SenSharma

5 REPLIES 5

Re: after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Mentor

@hardik desai

Typically with a replication factor of 3, your fault data node can be removed with no impact as 2 other copies should be available. Just to understand your setup how many data nodes are we talking of here?

Running the rebalancer will do the job but with 2 TB it should take a while depending on your data center bandwidth.

Rebalancing HDFS

HDFS provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. To initiate a balancing process, follow these steps:

In Ambari Web, browse to Services > HDFS > Summary.

Click Service Actions, and then click Rebalance HDFS.

Enter the Balance Threshold value as a percentage of disk capacity.

Click Start.

It's recommended running the balancer during times when the cluster load is low else you'll notice high NN RPC, when balancer is executing.

Here is a document that can help you tune the hdfs Balancer

Re: after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Contributor

@Geoffrey Shelton Okot thanks for the reply.

we have a setup of 6 data nodes in this cluster with 2TB data on each node.

also while shutting down the datanode, is re-balancing start auto or we have to initiate the same?

Thanks,

Re: after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Mentor

@hardik desai

When you run the shutdown command the balancer kicks off, it takes up network resources. You want to make sure you are not doing it when you have heavy load as those jobs will be affected. There are also checks against moving blocks as you may have current jobs using them.

Decommissioning is a more elegant way of triggering rebalancing and zero data block loss your undesired node will be decommissioned once all the blocks have been replicated.

This is the more used option.

Re: after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Mentor

@hardik desai

Any updates? If my response answered your question then can you take time and "accept" it so others HCC members could reference it

Re: after shutting down data node permanently , will it automatically start doing re-balance of hdfs data on other data nodes & how much time does it take to re-balance 2 TB of data?

Contributor

@Geoffrey Shelton Okot the activiy got finished & balancer started as soon as i stopped my data node but it took so much time to finish the activity.

Is there any way by which i can perform this activity more faster?

Thanks,