Support Questions

Find answers, ask questions, and share your expertise

Help with exception from HDFS balancer

avatar
Super Collaborator

We recently tried to run the HDFS balancer for the first time.

(Somehow we've been using HDP for almost 2 years and never knew that we should be doing this)

After about an hour, it showed "5.56 GB moved / 12.72 TB left / 40 GB being processed".

Now (10 hours later), it still says the same thing. Does anyone know what the issue or solution here is?

We're on 2.2.8 running in HA. There's nothing in stderr.

stdout is below. [balancer] 16/02/24 21:11:51 WARN balancer.Dispatcher: Failed to move blk_1083404960_9672884 with size=134217728 from 10.22.4.44:50010:DISK to 10.22.6.22:50010:DISK through 10.22.4.46:50010: block move is failed: Not able to receive block 1083404960 from /10.22.4.64:38809 because threads quota is exceeded. [balancer] 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.4.46:50010 activateDelay 10.0 seconds 16/02/24 21:11:51 WARN balancer.Dispatcher: Failed to move blk_1083404871_9672795 with size=134217728 from 10.22.4.44:50010:DISK to 10.22.6.22:50010:DISK through 10.22.4.44:50010: block move is failed: Not able to receive block 1083404871 from /10.22.4.64:38810 because threads quota is exceeded. 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.6.22:50010 activateDelay 10.0 seconds [balancer] 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.4.44:50010 activateDelay 10.0 seconds 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.6.22:50010 activateDelay 10.0 seconds [balancer] 16/02/24 21:11:56 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.46:50010:DISK [balancer] 16/02/24 21:11:56 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.44:50010:DISK [balancer] 16/02/24 21:11:57 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.43:50010:DISK [balancer] 16/02/24 21:11:57 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.41:50010:DISK

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Finally have the balancer running fairly well.

In the end, we were not able to get good results using the UI link from Ambari.

Running via CLI with some in-line parameters is working well for us:

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log 

View solution in original post

7 REPLIES 7

avatar
Master Guru

http://stackoverflow.com/questions/25222633/hadoop-balancer-command-warn-messages-threads-quota-is-e...

Which version of Hadoop are you using? The answer seems pretty complete.

avatar
Master Mentor

There on 2.2.8 which is 2.6

avatar
Super Collaborator

Right. We should already have this bug fix.

avatar
Master Guru

Perhaps change the thread number parameter mentioned in the link? Still weird that it doesn't move at all.

avatar
Super Collaborator

Interestingly, I don't see a dfs.datanode.balance.max.concurrent.moves property in our config.

I do see dfs.datanode.balance.bandwidthPerSec

Any idea what a good value for dfs.datanode.balance.max.concurrent.moves would be?

I suppose it's a function of the number of data nodes.

avatar
Master Mentor

@Zack Riesland and @Benjamin Leonhardi I would leave it alone and let balancer complete. If you add the property discussed it will most likely kill the balancer now. It's just a warning and it just means cannot do more than 5 threads on one node. This property will probably trigger a restart, test it out first in a dev cluster. You can add it as custom property in hdfs-site.xml.

avatar
Super Collaborator

Finally have the balancer running fairly well.

In the end, we were not able to get good results using the UI link from Ambari.

Running via CLI with some in-line parameters is working well for us:

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log