Support Questions

yagoaparecidoti · ‎04-04-2022

We are running rebalance on HDFS and we are getting the error below:

"WARN balancer.Dispatcher: Failed to move blk_1314197946_240487461 with size=134217728 from 10.10.10.109:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message opReplaceBlock BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 received exception java.io.IOException: Got error, status message Not able to copy block 1314197946 to /10.10.10.183:51650 because threads quota is exceeded., copy block BP-1707938289-10.10.10.96-1520510791093:blk_1314197946_240487461 from /10.10.10.99:1019, block move is failed"

we would like to know why we are getting these errors?

would it be because the network interfaces are 1GB?

if so, how can we resolve these errors? increasing the speed of network interfaces?

PS: The hosts have 1GB network interfaces

PS: We are using Ambari Server version 2.6.2.2.

Akarsh · ‎04-05-2022

Hello,

The error is due to the exhausted thread quota on the DN side. Usually this can be controlled using the balancer parameters.

Kindly refer

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/data-storage/content/properties_for_configurin...

Ideally changing the values for "dfs.datanode.balance.max.concurrent.moves" should help come out of the issue.

n/w bandwidth can become an issue while we are dealing with large volume of data movement but according to this error it's on the quota.

yagoaparecidoti · ‎04-05-2022

hi @Akarsh

the documentation doesn't show how to use the parameter "dfs.datanode.balance.max.concurrent.moves" in the balancer execution command.

how can we use this parameter in the balancer execution?

Akarsh · ‎04-05-2022

There are 2 ways. One is directly adding it in hdfs-site.xml OR just triggering the balancer with these parameters like

nohup hdfs balancer -Ddfs.balancer.moverThreads=300 -Ddfs.datanode.balance.max.concurrent.moves=20 -Ddfs.datanode.balance.bandwidthPerSec=20480000 -Ddfs.balancer.dispatcherThreads=400 -Ddfs.balancer.max-size-to-move=100737418240 -threshold 10 >/tmp/new_balancer1.out
This will run the balancer in non default values and it will finish the balancer operation much more quicker.
** Be aware that the run using above command and parameter will cause high Bandwidth usage and will create lot of i/o storms.

For more details on the parameters mentioned above please refer below doc

https://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

yagoaparecidoti · ‎04-05-2022

hi @Akarsh

The balancer was run like this:

hdfs balancer -Ddfs.datanode.balance.bandwidthPerSec=1073741824 -Ddfs.datanode.balance.max.concurrent.moves=20 -threshold 5

But still the same error appeared:

WARN balancer.Dispatcher: Failed to move blk_1275620781_201901979 with size=99352364 from 10.10.10.99:1019:DISK to 10.10.10.183:1019:DISK through 10.10.10.99:1019: Got error, status message Not able to receive block 1275620781 from /10.10.10.212:44466 because threads quota is exceeded., block move is failed

shubham_sharma · ‎04-06-2022

Hi @yagoaparecidoti

You need not need to worry about this warning as these blocks will be re-attempted to move again during the balancer job run.

For example, check after several hours, regarding the block "blk_1275620781" movement completion in logs.

yagoaparecidoti · ‎04-06-2022

hi @shubham_sharma, how are you?

thanks for the info.

we will rerun the rebalancer on HDFS and monitor the behavior.

any error that appears we return in the sequence.

DianaTorres · ‎04-12-2022

@yagoaparecidoti Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks!

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Cloudera Community

Support Questions

Rebalance HDFS - Not able to copy block 1314213952 because threads quota is exceeded

Invalid Block token exception while copying data t...

Force closing a HDFS file still open (because unco...

post-user-creation-hook.sh - Adding HDFS quotas

Knox ldap search fails because the size limit is e...

hdfs file actual block paths

Comparison : Kudu Copy Command vs Spark backup uti...

HDFS underreplicated blocks

Question on HDFS rebalance

Unable to copy files from NiFi to HDFS

HDFS - Under-Replicated Blocks, missing Blocks