Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

balancer is slow...

Solved Go to solution

balancer is slow...

Expert Contributor

Hello

We added 5 new DataNodes to our HDP cluster. We run the balancer manually so the newly added DNs will be balanced with the rest (manually because of the Bug in Ambari which is described here https://community.hortonworks.com/articles/4595/balancer-not-working-in-hdfs-ha.html) with the default threshold of 10, and changed the default dfs.datanode.balance.bandwidthPerSec to 100000000. However, it seems that the progress of balancing data from the old DNs to the new ones is roughly 20-30gb a day. It's been running 3 days straight and the newly added disks have only 63gb~ of data.

Is there a way to increase the balancer's speed ?? By the way - All of the jobs were stopped so the cluster is completely idle.

Thanks in advance.

Adi

1 ACCEPTED SOLUTION

Accepted Solutions

Re: balancer is slow...

Hi @Adi Jabkowsky, the balancer is running in the background and it's slow by design. No need to worry. You can keep on using the cluster, running jobs, etc. Increasing the balancer bandwidth to 100M/s is good, and running from command line instead from Ambari is also a good choice. To have a better sense of completion, and better insight into required time, you can run next time first with 20% threshold and then run again reducing it to 15% and 10%.

7 REPLIES 7

Re: balancer is slow...

Hi @Adi Jabkowsky, the balancer is running in the background and it's slow by design. No need to worry. You can keep on using the cluster, running jobs, etc. Increasing the balancer bandwidth to 100M/s is good, and running from command line instead from Ambari is also a good choice. To have a better sense of completion, and better insight into required time, you can run next time first with 20% threshold and then run again reducing it to 15% and 10%.

Re: balancer is slow...

Expert Contributor

Hi @Predrag Minovic Thank you for responding!

Re: balancer is slow...

New Contributor

Please let me know what is the command to run ?

Re: balancer is slow...

Basically what Predrag said but there is also a way to increase the number of threads doing the moving:

Bandwidth already covered: dfs.balance.bandwidthPerSec=100000000

Increasing balancer move threads. Now this needs to be done in hdfs configuration of all datanodes and not in the client. I.e. it requires a restart:

dfs.datanode.balance.max.concurrent.moves=500

Increasing transfer threads to keep up with it.

dfs.datanode.max.transfer.threads=16384

Re: balancer is slow...

New Contributor

What are the files to edit and how do I run it ?

Re: balancer is slow...

These parameters are part of the hdfs configuration. You can set them in ambari. Look for the threads one and add a custom parameter for the moves one. And set the bandwidth similarily or in the command line of the balancer. That one can be done client side.

Highlighted

Re: balancer is slow...

Explorer

I started 20 balancers before, yes, off icier document said only one, but I check the source code, the comment said you cannot start multiple processes to do the balance, but the code is not finished. and the speed is very fast!

I am not sure the fatal disadvantage of start multiple balancers. Maybe the network, IO or something else.