Support Questions

Find answers, ask questions, and share your expertise

hadoop + how to rebalnce the hdfs

avatar

we have HDP cluster version `2.6.5` with `8` data nodes , all machines are installed on rhel 7.6 version

 

HDP cluster is based amabri platform version - `2.6.1`

 

each data-node ( worker machine ) include two disks and each disk size is 1.8T

 

when we access the data-node machines we can see differences between the size of the disks

 

for example on the first data-node the size is : ( by `df -h` )

 

/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb

 

on the second data-node the size is:

/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb

 

on the third data-node th size is:

/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb

 

and so on

 

the big question is why HDFS not perform the re-balance on the HDFS disks?

 

*for example expected results on all disks should be with the same size on all datanodes machines*


why is the used size differences between `datanode1` to `datanode2` to `datanode3` etc ?

 

any advice about the tune parameters in HDFS that can help us?

 

*because its very critical when one disk is reached `100%` size and the other are more small as `50%`*

Michael-Bronson
3 REPLIES 3

avatar
Rising Star

hello Michael,

I had a similar issue with my CDH bases cluster, solved by a stupid-like solution.
What I did is that first I turned the replication factor into 2 instead of  3, 
/*under replicated blocks notice should appear */ 
run the rebalance (by Blockpool then by Datanode  to make some shuffles between data nodes  )
, then reconfigure the replication factor to 3, then I noticed some major changes.

Not sure if that gonna work for you but just wanted to share my experience if want to try it.
Good luck 

avatar

can you described more about - "The rebalance (by Blockpool"

 

we have HDP cluster with ambari so not sure what we need to do 

Michael-Bronson

avatar
Contributor

Since you are using Ambari, you can you can try to use Rebalance HDFS action, or directly the Hadoop Balancer tool.