Support Questions

Find answers, ask questions, and share your expertise

List Of Block Placement Algorithm For HDFS

avatar
Super Collaborator

With HDFS-385, pluggable interface for HDFS block's replica placement is allowed. I found multiple HDFS JIRAs pointing to various such algorithm, yet didn't find any documentation listing the available choices. JIRA # HDFS-8789 will allow migration of block placement policy, yet the JIRA has no further details. Any help in identifying the list of such algorithms.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

A quick search in the code base tells me that we have these following policies

  • AvailableSpaceBlockPlacementPolicy
  • BlockPlacementPolicyDefault
  • BlockPlacementPolicyRackFaultTolerant
  • BlockPlacementPolicyWithNodeGroup
  • BlockPlacementPolicyWithUpgradeDomain

> yet didn't find any documentation listing the available choices.

You are absolutely right, we can certainly do better on documenting this, Thanks for pointing this out. I will address this in an Apache JIRA.

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

A quick search in the code base tells me that we have these following policies

  • AvailableSpaceBlockPlacementPolicy
  • BlockPlacementPolicyDefault
  • BlockPlacementPolicyRackFaultTolerant
  • BlockPlacementPolicyWithNodeGroup
  • BlockPlacementPolicyWithUpgradeDomain

> yet didn't find any documentation listing the available choices.

You are absolutely right, we can certainly do better on documenting this, Thanks for pointing this out. I will address this in an Apache JIRA.

avatar
Expert Contributor

@smdas Sorry forgot to tag you.

avatar
Expert Contributor

@aengineer In continuation of Smarak's concern, do we have any further info on this?

 

JIRA HDFS-8789 is still in unresolved state, so looks change in block placement policy is still not applicable to existing blocks.

 

Any thoughts?

 

avatar
Expert Contributor

@aengineer 

 

HDFS-14637 and HDFS-8789 appear to contradict each other as HDFS-14637 says that after changing the network topology or placement policy on a cluster and restarting the namenode, the namenode will scan all blocks on the cluster at startup, and check if they meet the current placement policy. If they do not, they are added to the replication queue and the namenode will arrange for them to be replicated to ensure the placement policy is used. 

 

It would be good to get some clarity on this.  

avatar
Expert Contributor

@aengineer 

 

Also noticed from one comment in HDFS-8789 that "balancer doesn't support anything other than the default placement policy (BlockPlacementPolicyDefault)."

 

HDFS-14053 says ability for NN to re-replicate blocks based on policy change is fixed in hadoop 3.3.0 [not sure if it's hadoop version or not, though NN version doesn't make sense], while HDFS-14637 supports above statement until UD get enable.