Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

balancer is slow...

Solved Go to solution
Highlighted

balancer is slow...

Expert Contributor

Hello

We added 5 new DataNodes to our HDP cluster. We run the balancer manually so the newly added DNs will be balanced with the rest (manually because of the Bug in Ambari which is described here https://community.hortonworks.com/articles/4595/balancer-not-working-in-hdfs-ha.html) with the default threshold of 10, and changed the default dfs.datanode.balance.bandwidthPerSec to 100000000. However, it seems that the progress of balancing data from the old DNs to the new ones is roughly 20-30gb a day. It's been running 3 days straight and the newly added disks have only 63gb~ of data.

Is there a way to increase the balancer's speed ?? By the way - All of the jobs were stopped so the cluster is completely idle.

Thanks in advance.

Adi

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: balancer is slow...

Hi @Adi Jabkowsky, the balancer is running in the background and it's slow by design. No need to worry. You can keep on using the cluster, running jobs, etc. Increasing the balancer bandwidth to 100M/s is good, and running from command line instead from Ambari is also a good choice. To have a better sense of completion, and better insight into required time, you can run next time first with 20% threshold and then run again reducing it to 15% and 10%.

View solution in original post

7 REPLIES 7
Highlighted

Re: balancer is slow...

Hi @Adi Jabkowsky, the balancer is running in the background and it's slow by design. No need to worry. You can keep on using the cluster, running jobs, etc. Increasing the balancer bandwidth to 100M/s is good, and running from command line instead from Ambari is also a good choice. To have a better sense of completion, and better insight into required time, you can run next time first with 20% threshold and then run again reducing it to 15% and 10%.

View solution in original post

Highlighted

Re: balancer is slow...

Expert Contributor

Hi @Predrag Minovic Thank you for responding!

Highlighted

Re: balancer is slow...

New Contributor

Please let me know what is the command to run ?

Highlighted

Re: balancer is slow...

Basically what Predrag said but there is also a way to increase the number of threads doing the moving:

Bandwidth already covered: dfs.balance.bandwidthPerSec=100000000

Increasing balancer move threads. Now this needs to be done in hdfs configuration of all datanodes and not in the client. I.e. it requires a restart:

dfs.datanode.balance.max.concurrent.moves=500

Increasing transfer threads to keep up with it.

dfs.datanode.max.transfer.threads=16384

Highlighted

Re: balancer is slow...

New Contributor

What are the files to edit and how do I run it ?

Highlighted

Re: balancer is slow...

These parameters are part of the hdfs configuration. You can set them in ambari. Look for the threads one and add a custom parameter for the moves one. And set the bandwidth similarily or in the command line of the balancer. That one can be done client side.

Highlighted

Re: balancer is slow...

Explorer

I started 20 balancers before, yes, off icier document said only one, but I check the source code, the comment said you cannot start multiple processes to do the balance, but the code is not finished. and the speed is very fast!

I am not sure the fatal disadvantage of start multiple balancers. Maybe the network, IO or something else.

Don't have an account?
Coming from Hortonworks? Activate your account here