Created 09-19-2022 01:34 AM
Hi all.
We have a cloud HDFS cluster.
We have DataNode with different size '/data' (from 1TB to 3TB). Node #1, which has 3TB, is 500Mb full, node #2, which has 1TB, is 900Mb full. Can you please tell me why this is happening and how to fix it?
The value of rebalancing threshold = 5.
Created 09-19-2022 02:10 AM
Hi @Anlarin ,
It is always suggested to have a homogeneous disk storage across Datanodes. Within datanode, if there are heterogeneous volumes, then when the block replicas are written to new disks on a Round Robin fashion, the disks with less capacity will fill up faster compared to the disks with higher size. If the client is local to Node 2, then it will place the 1st block on that node and it's expected to fill faster.
By choosing "Available Space Policy" the DNs would take into account how much space is available on each volume/disks when deciding where to place a new replica. To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space.
If using Cloudera Manager: Navigate to HDFS > Configuration > DataNode Change DataNode Volume Choosing Policy from Round Robin to Available Space
Click Save Changes
Restart the DataNodes
The above property only helps for volumes within Datanode. https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html
-
Was your question answered? Please take some time to click on “Accept as Solution” below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created 09-19-2022 02:10 AM
Hi @Anlarin ,
It is always suggested to have a homogeneous disk storage across Datanodes. Within datanode, if there are heterogeneous volumes, then when the block replicas are written to new disks on a Round Robin fashion, the disks with less capacity will fill up faster compared to the disks with higher size. If the client is local to Node 2, then it will place the 1st block on that node and it's expected to fill faster.
By choosing "Available Space Policy" the DNs would take into account how much space is available on each volume/disks when deciding where to place a new replica. To achieve writes that are evenly distribution in percentage of capacity on drives, change the choosing policy (dfs.datanode.fsdataset.volume.choosing.policy)to Available Space.
If using Cloudera Manager: Navigate to HDFS > Configuration > DataNode Change DataNode Volume Choosing Policy from Round Robin to Available Space
Click Save Changes
Restart the DataNodes
The above property only helps for volumes within Datanode. https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html
-
Was your question answered? Please take some time to click on “Accept as Solution” below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.