Support Questions

mike_bronson7 · ‎01-17-2021

we have HDP cluster version `2.6.5` with `8` data nodes , all machines are installed on rhel 7.6 version

HDP cluster is based amabri platform version - `2.6.1`

each data-node ( worker machine ) include two disks and each disk size is 1.8T

when we access the data-node machines we can see differences between the size of the disks

for example on the first data-node the size is : ( by `df -h` )

/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb

on the second data-node the size is:

/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb

on the third data-node th size is:

/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb

and so on

the big question is why HDFS not perform the re-balance on the HDFS disks?

*for example expected results on all disks should be with the same size on all datanodes machines*

why is the used size differences between `datanode1` to `datanode2` to `datanode3` etc ?

any advice about the tune parameters in HDFS that can help us?

*because its very critical when one disk is reached `100%` size and the other are more small as `50%`*

Michael-Bronson

Bildervic · ‎01-18-2021

hello Michael,

I had a similar issue with my CDH bases cluster, solved by a stupid-like solution.
What I did is that first I turned the replication factor into 2 instead of 3,
/*under replicated blocks notice should appear */
run the rebalance (by Blockpool then by Datanode to make some shuffles between data nodes )
, then reconfigure the replication factor to 3, then I noticed some major changes.

Not sure if that gonna work for you but just wanted to share my experience if want to try it.
Good luck

mike_bronson7 · ‎02-11-2021

can you described more about - "The rebalance (by Blockpool"

we have HDP cluster with ambari so not sure what we need to do

Michael-Bronson

Kezia · ‎02-11-2021

Since you are using Ambari, you can you can try to use Rebalance HDFS action, or directly the Hadoop Balancer tool.

Cloudera Community

Support Questions

hadoop + how to rebalnce the hdfs