Created on 03-22-2020 02:49 AM - last edited on 03-22-2020 08:15 AM by ask_bill_brooks
I am trying to rebalance hdfs with Cloudera Manager 6.3 with HDFS Balancer Document
It says add the same property dfs.datanode.balance.max.concurrent.moves into different section
But before adding the property I searched and saw dfs.datanode.balance.max.concurrent.moves was already there. Nevertheless, I did what the document says. After adding properties Cloudera Manager asked me to restart/redeploy stale configurations. Before restart, I saw totally different properties added.
I don't understand although we seem to add the same property why different properties are added to hdfs-site.xml?
Created 03-23-2020 03:59 AM
"although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager." --> Okay, I assume you are referring to the one highlighted in screenshot below
Yes its unnecessary to add dfs.datanode.balance.max.concurrent.moves in Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml if you had used the "Maximum Concurrent Moves" section. Also note that this "Maximum Concurrent Moves" is scoped only to balancer and not to datanodes. So for datanodes you have to explicitly set it using " DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"
Regarding reason for why to add this property both for balancer and datanode is mentioned in my previous comment. Hope that clarifies and let me know if there are further questions
I will raise an internal jira for correcting the document to avoid duplicate entry on balancer safety-valve.
Created 03-22-2020 06:09 AM
@erkansirin78 Let me make sure I understand the issue correctly. By this "
Before restart, I saw totally different properties added." Did you mean the property dfs.datanode.ec.reconstruction.xmts.weight getting added? If yes, then its not getting added instead the preview page is just showing the extra lines prior to the property that you added, only the lines with + sign matters.
Created 03-22-2020 12:15 PM
Yes, you are right. I have not realized that. But if dfs.datanode.ec.reconstruction.xmits.weight is already in hdfs-site.xml why Cloudera document makes us add the same property for balancer and DataNode again, what is the point?
Created 03-22-2020 08:16 PM
Just a correction The document suggest to tune property dfs.datanode.balance.max.concurrent.moves and not dfs.datanode.ec.reconstruction.xmits.weight
Regarding the question of dfs.datanode.balance.max.concurrent.moves is already present in Datanode and balancer so why to add again. The doc says "Add the following code to the configuration field, for example, setting the value to 50." i.e 50 is just a example number and the document doesnt mandate setting this value to 50. You can tune it to any value of your requirement.
Then why to add in both balancer and datanode?
Setting it on HDFS Balancer(client) will give the flexibility to change this value on the client side at runtime i.e you can set this property to a value lesser or equal to what you have configured on the datanode side. Reason why we set this on server side is to impose a limit till what value the property can be configured. If you configure a value greater than what you have set on the Datanode(server), the datanodes fails it
Created 03-23-2020 01:39 AM
I know thank you but I still don't understand why we add same property (dfs.datanode.balance.max.concurrent.moves) on a different section
although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager. Is CM supposed to refuse this addition?
Created 03-23-2020 03:59 AM
"although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager." --> Okay, I assume you are referring to the one highlighted in screenshot below
Yes its unnecessary to add dfs.datanode.balance.max.concurrent.moves in Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml if you had used the "Maximum Concurrent Moves" section. Also note that this "Maximum Concurrent Moves" is scoped only to balancer and not to datanodes. So for datanodes you have to explicitly set it using " DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"
Regarding reason for why to add this property both for balancer and datanode is mentioned in my previous comment. Hope that clarifies and let me know if there are further questions
I will raise an internal jira for correcting the document to avoid duplicate entry on balancer safety-valve.
Created 03-25-2020 06:28 AM
Thank you very much. This is the one that satisfies me. Documents are expected to make clear and simple things, not complicated.