Support Questions
Find answers, ask questions, and share your expertise

Rebalancer on HDFS finishes very quickly after adding new nodes to cluster

Rebalancer on HDFS finishes very quickly after adding new nodes to cluster

Contributor

Hi,

 

We have a CDH 5.5.1 cluster with 10 data nodes, recently added 3 more data nodes, previously when we used to run rebalancer on hdfs through Cloudera Manager it used to take like 45 mins to 2 hrs. to finish, but after adding new DN's its completing in 4 seconds, which does not make sense and I see there is no shuffle of data between DN's.

 

Is there a way we can resolve this or find out whether actually the balancer is running, or is there any way we can run balancer manually rather than through cloudera manager.

 

Anyone please help us with it !!!!!!!!!

1 REPLY 1

Re: Rebalancer on HDFS finishes very quickly after adding new nodes to cluster

Master Collaborator

I'm curious why you need to frequently run the HDFS Balancer.  Over time the data should balance out reasonably well without intervention, unless you are writing data to the cluster directly from nodes which are also acting as Datanodes...then the local node where the write originates will get an unbalanced amount of the blocks.

 

I do not know why your balancer seems to be executing so fast after adding the new nodes, but are you sure those nodes are successfully added to the cluster?  Has any data been written to them?

 

This doc explains how to run the balancer from the command-line if you'd like to try that route.