Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Incorrect hard disk balancing between datanode

avatar
New Contributor

Hi all.
We have a cloud HDFS cluster.
We have DataNode with different size '/data' (from 1TB to 3TB). Node #1, which has 3TB, is 500Mb full, node #2, which has 1TB, is 900Mb full. Can you please tell me why this is happening and how to fix it?
The value of rebalancing threshold = 5.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @Anlarin ,

 

It is always suggested to have a homogeneous disk storage across Datanodes. Within datanode, if there are heterogeneous volumes, then when the block replicas are written to new disks on a Round Robin fashion, the disks with less capacity will fill up faster compared to the disks with higher size. If the client is local to Node 2, then it will place the 1st block on that node and it's expected to fill faster.

 

By choosing "Available Space Policy" the DNs would take into account how much space is available on each volume/disks when deciding where to place a new replica. To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space.

 

If using Cloudera Manager: Navigate to HDFS > Configuration > DataNode Change DataNode Volume Choosing Policy from Round Robin to Available Space

 

Click Save Changes

Restart the DataNodes

 

The above property only helps for volumes within Datanode. https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html

 

-
Was your question answered? Please take some time to click on “Accept as Solution” below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hi @Anlarin ,

 

It is always suggested to have a homogeneous disk storage across Datanodes. Within datanode, if there are heterogeneous volumes, then when the block replicas are written to new disks on a Round Robin fashion, the disks with less capacity will fill up faster compared to the disks with higher size. If the client is local to Node 2, then it will place the 1st block on that node and it's expected to fill faster.

 

By choosing "Available Space Policy" the DNs would take into account how much space is available on each volume/disks when deciding where to place a new replica. To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space.

 

If using Cloudera Manager: Navigate to HDFS > Configuration > DataNode Change DataNode Volume Choosing Policy from Round Robin to Available Space

 

Click Save Changes

Restart the DataNodes

 

The above property only helps for volumes within Datanode. https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html

 

-
Was your question answered? Please take some time to click on “Accept as Solution” below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.