Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

hadoop + how to rebalnce the hdfs


we have HDP cluster version `2.6.5` with `8` data nodes , all machines are installed on rhel 7.6 version


HDP cluster is based amabri platform version - `2.6.1`


each data-node ( worker machine ) include two disks and each disk size is 1.8T


when we access the data-node machines we can see differences between the size of the disks


for example on the first data-node the size is : ( by `df -h` )


/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb


on the second data-node the size is:

/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb


on the third data-node th size is:

/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb


and so on


the big question is why HDFS not perform the re-balance on the HDFS disks?


*for example expected results on all disks should be with the same size on all datanodes machines*

why is the used size differences between `datanode1` to `datanode2` to `datanode3` etc ?


any advice about the tune parameters in HDFS that can help us?


*because its very critical when one disk is reached `100%` size and the other are more small as `50%`*


Rising Star

hello Michael,

I had a similar issue with my CDH bases cluster, solved by a stupid-like solution.
What I did is that first I turned the replication factor into 2 instead of  3, 
/*under replicated blocks notice should appear */ 
run the rebalance (by Blockpool then by Datanode  to make some shuffles between data nodes  )
, then reconfigure the replication factor to 3, then I noticed some major changes.

Not sure if that gonna work for you but just wanted to share my experience if want to try it.
Good luck 


can you described more about - "The rebalance (by Blockpool"


we have HDP cluster with ambari so not sure what we need to do 



Since you are using Ambari, you can you can try to use Rebalance HDFS action, or directly the Hadoop Balancer tool.