Created 04-04-2022 12:00 PM
We are running rebalance on HDFS and we are getting the error below:
"WARN balancer.Dispatcher: Failed to move blk_1314197946_240487461 with size=134217728 from 10.10.10.109:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message opReplaceBlock BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 received exception java.io.IOException: Got error, status message Not able to copy block 1314197946 to /10.10.10.183:51650 because threads quota is exceeded., copy block BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 from /10.10.10.99:1019, block move is failed"
we would like to know why we are getting these errors?
would it be because the network interfaces are 1GB?
if so, how can we resolve these errors? increasing the speed of network interfaces?
PS: The hosts have 1GB network interfaces
PS: We are using Ambari Server version 2.6.2.2.
Created 04-05-2022 12:37 AM
Hello,
The error is due to the exhausted thread quota on the DN side. Usually this can be controlled using the balancer parameters.
Kindly refer
Ideally changing the values for "dfs.datanode.balance.max.concurrent.moves" should help come out of the issue.
n/w bandwidth can become an issue while we are dealing with large volume of data movement but according to this error it's on the quota.
Created 04-05-2022 05:33 AM
hi @Akarsh
the documentation doesn't show how to use the parameter "dfs.datanode.balance.max.concurrent.moves" in the balancer execution command.
how can we use this parameter in the balancer execution?
Created 04-05-2022 06:26 AM
There are 2 ways. One is directly adding it in hdfs-site.xml OR just triggering the balancer with these parameters like
nohup hdfs balancer -Ddfs.balancer.moverThreads=300 -Ddfs.datanode.balance.max.concurrent.moves=20 -Ddfs.datanode.balance.bandwidthPerSec=20480000 -Ddfs.balancer.dispatcherThreads=400 -Ddfs.balancer.max-size-to-move=100737418240 -threshold 10 >/tmp/new_balancer1.out
This will run the balancer in non default values and it will finish the balancer operation much more quicker.
** Be aware that the run using above command and parameter will cause high Bandwidth usage and will create lot of i/o storms.
For more details on the parameters mentioned above please refer below doc
https://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Created 04-05-2022 07:12 AM
hi @Akarsh
The balancer was run like this:
hdfs balancer -Ddfs.datanode.balance.bandwidthPerSec=1073741824 -Ddfs.datanode.balance.max.concurrent.moves=20 -threshold 5
But still the same error appeared:
WARN balancer.Dispatcher: Failed to move blk_1275620781_201901979 with size=99352364 from 10.10.10.99:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message Not able to receive block 1275620781 from /10.10.10.212:44466 because threads quota is exceeded., block move is failed
Created 04-06-2022 11:39 AM
You need not need to worry about this warning as these blocks will be re-attempted to move again during the balancer job run.
For example, check after several hours, regarding the block "blk_1275620781" movement completion in logs.
Created 04-06-2022 12:55 PM
hi @shubham_sharma, how are you?
thanks for the info.
we will rerun the rebalancer on HDFS and monitor the behavior.
any error that appears we return in the sequence.
Created 04-12-2022 02:14 PM
@yagoaparecidoti Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks!
Regards,
Diana Torres,