Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Help with exception from HDFS balancer

SOLVED Go to solution
Highlighted

Help with exception from HDFS balancer

Super Collaborator

We recently tried to run the HDFS balancer for the first time.

(Somehow we've been using HDP for almost 2 years and never knew that we should be doing this)

After about an hour, it showed "5.56 GB moved / 12.72 TB left / 40 GB being processed".

Now (10 hours later), it still says the same thing. Does anyone know what the issue or solution here is?

We're on 2.2.8 running in HA. There's nothing in stderr.

stdout is below. [balancer] 16/02/24 21:11:51 WARN balancer.Dispatcher: Failed to move blk_1083404960_9672884 with size=134217728 from 10.22.4.44:50010:DISK to 10.22.6.22:50010:DISK through 10.22.4.46:50010: block move is failed: Not able to receive block 1083404960 from /10.22.4.64:38809 because threads quota is exceeded. [balancer] 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.4.46:50010 activateDelay 10.0 seconds 16/02/24 21:11:51 WARN balancer.Dispatcher: Failed to move blk_1083404871_9672795 with size=134217728 from 10.22.4.44:50010:DISK to 10.22.6.22:50010:DISK through 10.22.4.44:50010: block move is failed: Not able to receive block 1083404871 from /10.22.4.64:38810 because threads quota is exceeded. 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.6.22:50010 activateDelay 10.0 seconds [balancer] 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.4.44:50010 activateDelay 10.0 seconds 16/02/24 21:11:51 INFO balancer.Dispatcher: DDatanode:10.22.6.22:50010 activateDelay 10.0 seconds [balancer] 16/02/24 21:11:56 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.46:50010:DISK [balancer] 16/02/24 21:11:56 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.44:50010:DISK [balancer] 16/02/24 21:11:57 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.43:50010:DISK [balancer] 16/02/24 21:11:57 INFO balancer.Dispatcher: Failed to find a pending move 5 times. Skipping 10.22.4.41:50010:DISK

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Help with exception from HDFS balancer

Super Collaborator

Finally have the balancer running fairly well.

In the end, we were not able to get good results using the UI link from Ambari.

Running via CLI with some in-line parameters is working well for us:

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log 
7 REPLIES 7

Re: Help with exception from HDFS balancer

http://stackoverflow.com/questions/25222633/hadoop-balancer-command-warn-messages-threads-quota-is-e...

Which version of Hadoop are you using? The answer seems pretty complete.

Re: Help with exception from HDFS balancer

Mentor

There on 2.2.8 which is 2.6

Re: Help with exception from HDFS balancer

Super Collaborator

Right. We should already have this bug fix.

Re: Help with exception from HDFS balancer

Perhaps change the thread number parameter mentioned in the link? Still weird that it doesn't move at all.

Re: Help with exception from HDFS balancer

Super Collaborator

Interestingly, I don't see a dfs.datanode.balance.max.concurrent.moves property in our config.

I do see dfs.datanode.balance.bandwidthPerSec

Any idea what a good value for dfs.datanode.balance.max.concurrent.moves would be?

I suppose it's a function of the number of data nodes.

Re: Help with exception from HDFS balancer

Mentor

@Zack Riesland and @Benjamin Leonhardi I would leave it alone and let balancer complete. If you add the property discussed it will most likely kill the balancer now. It's just a warning and it just means cannot do more than 5 threads on one node. This property will probably trigger a restart, test it out first in a dev cluster. You can add it as custom property in hdfs-site.xml.

Re: Help with exception from HDFS balancer

Super Collaborator

Finally have the balancer running fairly well.

In the end, we were not able to get good results using the UI link from Ambari.

Running via CLI with some in-line parameters is working well for us:

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log