Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Balancer: Why configure same property?

avatar
Expert Contributor

I am trying to rebalance hdfs with Cloudera Manager 6.3 with HDFS Balancer Document

It says add the same property dfs.datanode.balance.max.concurrent.moves into different section

  1.  DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
  2.  Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml

But before adding the property I searched and saw dfs.datanode.balance.max.concurrent.moves was already there.  Nevertheless, I did what the document says. After adding properties Cloudera Manager asked me to restart/redeploy stale configurations. Before restart, I saw totally different properties added.

cloudera_manager_rebalancer_cloudera_community_soru.png

 I don't understand although we seem to add the same property why different properties are added to hdfs-site.xml?

 

1 ACCEPTED SOLUTION

avatar

"although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager." --> Okay, I assume you are referring to the one highlighted in screenshot below

 

DFS-concurrent.png

Yes its unnecessary to add dfs.datanode.balance.max.concurrent.moves in Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml if you had used the "Maximum Concurrent Moves" section. Also note that this "Maximum Concurrent Moves" is scoped only to balancer and not to datanodes. So for datanodes you have to explicitly set it using " DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"

 

Regarding reason for why to add this property both for balancer and datanode is mentioned in my previous comment. Hope that clarifies and let me know if there are further questions


I will raise an internal jira for correcting the document to avoid duplicate entry on balancer safety-valve.

View solution in original post

6 REPLIES 6

avatar

@erkansirin78 Let me make sure I understand the issue correctly. By this "

Before restart, I saw totally different properties added." Did you mean the property dfs.datanode.ec.reconstruction.xmts.weight getting added? If yes, then its not getting added instead the preview page is just showing the extra lines prior to the property that you added, only the lines with + sign matters.

avatar
Expert Contributor

Yes, you are right. I have not realized that. But if dfs.datanode.ec.reconstruction.xmits.weight  is already in hdfs-site.xml why Cloudera document makes us add the same property for balancer and DataNode again, what is the point?

avatar

Just a correction The document suggest to tune property dfs.datanode.balance.max.concurrent.moves and not dfs.datanode.ec.reconstruction.xmits.weight

 

Regarding the question of dfs.datanode.balance.max.concurrent.moves is already present in Datanode and balancer so why to add again. The doc says "Add the following code to the configuration field, for example, setting the value to 50." i.e 50 is just a example number and the document doesnt mandate setting this value to 50. You can tune it to any value of your requirement.

 

Then why to add in both balancer and datanode?

Setting it on HDFS Balancer(client) will give the flexibility to change this value on the client side at runtime i.e you can set this property to a value lesser or equal to what you have configured on the datanode side. Reason why we set this on server side is to impose a limit till what value the property can be configured. If you configure a value greater than what you have set on the Datanode(server), the datanodes fails it

avatar
Expert Contributor

I know thank you but I still don't understand why we add same property (dfs.datanode.balance.max.concurrent.moves) on a different section

  1.  DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml
  2.  Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml

although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager. Is CM  supposed to refuse this addition?

avatar

"although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager." --> Okay, I assume you are referring to the one highlighted in screenshot below

 

DFS-concurrent.png

Yes its unnecessary to add dfs.datanode.balance.max.concurrent.moves in Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml if you had used the "Maximum Concurrent Moves" section. Also note that this "Maximum Concurrent Moves" is scoped only to balancer and not to datanodes. So for datanodes you have to explicitly set it using " DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"

 

Regarding reason for why to add this property both for balancer and datanode is mentioned in my previous comment. Hope that clarifies and let me know if there are further questions


I will raise an internal jira for correcting the document to avoid duplicate entry on balancer safety-valve.

avatar
Expert Contributor

Thank you very much. This is the one that satisfies me. Documents are expected to make clear and simple things, not complicated.