Support Questions
Find answers, ask questions, and share your expertise

hadoop + how to rebalnce the hdfs

hadoop + how to rebalnce the hdfs

we have HDP cluster version `2.6.5` with `8` data nodes , all machines are installed on rhel 7.6 version

 

HDP cluster is based amabri platform version - `2.6.1`

 

each data-node ( worker machine ) include two disks and each disk size is 1.8T

 

when we access the data-node machines we can see differences between the size of the disks

 

for example on the first data-node the size is : ( by `df -h` )

 

/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb

 

on the second data-node the size is:

/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb

 

on the third data-node th size is:

/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb

 

and so on

 

the big question is why HDFS not perform the re-balance on the HDFS disks?

 

*for example expected results on all disks should be with the same size on all datanodes machines*


why is the used size differences between `datanode1` to `datanode2` to `datanode3` etc ?

 

any advice about the tune parameters in HDFS that can help us?

 

*because its very critical when one disk is reached `100%` size and the other are more small as `50%`*

Michael-Bronson
3 REPLIES 3

Re: hadoop + how to rebalnce the hdfs

Contributor

hello Michael,

I had a similar issue with my CDH bases cluster, solved by a stupid-like solution.
What I did is that first I turned the replication factor into 2 instead of  3, 
/*under replicated blocks notice should appear */ 
run the rebalance (by Blockpool then by Datanode  to make some shuffles between data nodes  )
, then reconfigure the replication factor to 3, then I noticed some major changes.

Not sure if that gonna work for you but just wanted to share my experience if want to try it.
Good luck 

Re: hadoop + how to rebalnce the hdfs

can you described more about - "The rebalance (by Blockpool"

 

we have HDP cluster with ambari so not sure what we need to do 

Michael-Bronson

Re: hadoop + how to rebalnce the hdfs

Explorer

Since you are using Ambari, you can you can try to use Rebalance HDFS action, or directly the Hadoop Balancer tool.