Support Questions

Find answers, ask questions, and share your expertise

Rebalance HDFS - Not able to copy block 1314213952 because threads quota is exceeded

avatar
Expert Contributor

We are running rebalance on HDFS and we are getting the error below:

 

"WARN balancer.Dispatcher: Failed to move blk_1314197946_240487461 with size=134217728 from 10.10.10.109:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message opReplaceBlock BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 received exception java.io.IOException: Got error, status message Not able to copy block 1314197946 to /10.10.10.183:51650 because threads quota is exceeded., copy block BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 from /10.10.10.99:1019, block move is failed"

 

we would like to know why we are getting these errors?

 

would it be because the network interfaces are 1GB?

 

if so, how can we resolve these errors? increasing the speed of network interfaces?

 

PS: The hosts have 1GB network interfaces

PS: We are using Ambari Server version 2.6.2.2.

7 REPLIES 7

avatar
Cloudera Employee

Hello,

The error is due to the exhausted thread quota on the DN side. Usually this can be controlled using the balancer parameters. 

Kindly refer

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/data-storage/content/properties_for_configurin...

Ideally changing the values for "dfs.datanode.balance.max.concurrent.moves" should help come out of the issue. 

n/w bandwidth can become an issue while we are dealing with large volume of data movement but according to this error it's on the quota.

avatar
Expert Contributor

hi @Akarsh 

 

the documentation doesn't show how to use the parameter "dfs.datanode.balance.max.concurrent.moves" in the balancer execution command.

 

how can we use this parameter in the balancer execution?

avatar
Cloudera Employee

There are 2 ways. One is directly adding it in hdfs-site.xml OR just triggering the balancer with these parameters like 

nohup hdfs balancer -Ddfs.balancer.moverThreads=300 -Ddfs.datanode.balance.max.concurrent.moves=20 -Ddfs.datanode.balance.bandwidthPerSec=20480000 -Ddfs.balancer.dispatcherThreads=400 -Ddfs.balancer.max-size-to-move=100737418240 -threshold 10 >/tmp/new_balancer1.out 
This will run the balancer in non default values and it will finish the balancer operation much more quicker. 
** Be aware that the run using above command and parameter will cause high Bandwidth usage and will create lot of i/o storms. 

For more details on the parameters mentioned above please refer below doc

https://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

avatar
Expert Contributor

hi @Akarsh 

 

The balancer was run like this:

 

hdfs balancer -Ddfs.datanode.balance.bandwidthPerSec=1073741824 -Ddfs.datanode.balance.max.concurrent.moves=20 -threshold 5

 

But still the same error appeared:

 

WARN balancer.Dispatcher: Failed to move blk_1275620781_201901979 with size=99352364 from 10.10.10.99:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message Not able to receive block 1275620781 from /10.10.10.212:44466 because threads quota is exceeded., block move is failed

avatar
Expert Contributor

Hi @yagoaparecidoti 

 

You need not need to worry about this warning as these blocks will be re-attempted to move again during the balancer job run.

 

For example, check after several hours, regarding the block "blk_1275620781" movement completion in logs.

avatar
Expert Contributor

hi @shubham_sharma, how are you?

 

thanks for the info.

 

we will rerun the rebalancer on HDFS and monitor the behavior.

 

any error that appears we return in the sequence.

avatar
Community Manager

@yagoaparecidoti Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  Thanks!


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: