Support Questions

Find answers, ask questions, and share your expertise

How to make data replication faster after adding new nodes in cluster.

avatar
New Contributor

We have 32 datanodes in the cluster and recently added 2 new datanodes in cluster. However, the data replication after running the load balancer is very slow and takes a lot of time. Does modifying the parameter dfs.datanode.max.transfer.threads have an impact on this? Also, how to calculate the value to which it should be set.

1 ACCEPTED SOLUTION

avatar

@Piyali Gupta

Here are the steps to increase HDFS Balancer network bandwidth for faster balancing of data between nodes

Article

hdfs dfsadmin -setBalancerBandwidth 100000000

on all the DN and the client we ran the command below

hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 5

This will faster balance your HDFS data between datanodes and do this when the cluster is not heavily used.

Couple of links to article : https://community.hortonworks.com/articles/51935/how-to-increase-hdfs-balancer-network-bandwidth-fo....

https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

Hope this helps you.

View solution in original post

1 REPLY 1

avatar

@Piyali Gupta

Here are the steps to increase HDFS Balancer network bandwidth for faster balancing of data between nodes

Article

hdfs dfsadmin -setBalancerBandwidth 100000000

on all the DN and the client we ran the command below

hdfs balancer -Dfs.defaultFS=hdfs://<NN_HOSTNAME>:8020 -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.max.concurrent.moves=5 -Ddfs.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 5

This will faster balance your HDFS data between datanodes and do this when the cluster is not heavily used.

Couple of links to article : https://community.hortonworks.com/articles/51935/how-to-increase-hdfs-balancer-network-bandwidth-fo....

https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html

Hope this helps you.