Support Questions
Find answers, ask questions, and share your expertise

hadoop + how to rebalnce the hdfs

hadoop + how to rebalnce the hdfs

we have HDP cluster version `2.6.5` with `8` data nodes , all machines are installed on rhel 7.6 version


HDP cluster is based amabri platform version - `2.6.1`


each data-node ( worker machine ) include two disks and each disk size is 1.8T


when we access the data-node machines we can see differences between the size of the disks


for example on the first data-node the size is : ( by `df -h` )


/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb


on the second data-node the size is:

/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb


on the third data-node th size is:

/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb


and so on


the big question is why HDFS not perform the re-balance on the HDFS disks?


*for example expected results on all disks should be with the same size on all datanodes machines*

why is the used size differences between `datanode1` to `datanode2` to `datanode3` etc ?


any advice about the tune parameters in HDFS that can help us?


*because its very critical when one disk is reached `100%` size and the other are more small as `50%`*


Re: hadoop + how to rebalnce the hdfs


hello Michael,

I had a similar issue with my CDH bases cluster, solved by a stupid-like solution.
What I did is that first I turned the replication factor into 2 instead of  3, 
/*under replicated blocks notice should appear */ 
run the rebalance (by Blockpool then by Datanode  to make some shuffles between data nodes  )
, then reconfigure the replication factor to 3, then I noticed some major changes.

Not sure if that gonna work for you but just wanted to share my experience if want to try it.
Good luck 

Re: hadoop + how to rebalnce the hdfs

can you described more about - "The rebalance (by Blockpool"


we have HDP cluster with ambari so not sure what we need to do 


Re: hadoop + how to rebalnce the hdfs


Since you are using Ambari, you can you can try to use Rebalance HDFS action, or directly the Hadoop Balancer tool.