Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Subsequent HDFS Rebalancer Runs Slower

Subsequent HDFS Rebalancer Runs Slower

New Contributor

Hello,

For the last couple days I have been working on tuning our cluster HDFS rebalance. The cluster had a large disparity in data due to not running the rebalance tool and needed to move ~150tb of data to rebalance.

I managed to tune the settings to move around 180gb every 5 minutes so it should be able to rebalance in about 3 days. The job failed over the weekend due to the kerberos ticket expiring so after restarting it the transfer rate is now around 60gb every 5mins. I restarted the balancer again and now it's around 25gb every 5mins. At this rate it will take over 2 weeks to balance the cluster.

I am running the balancer with debug INFO level and do not see any block failures during any iterations of the balancer. I am monitoring the disk and network io during the balancer runs and we are not anywhere near saturation of either.


Has anyone experienced behaviors with the balancer like this?


For reference this is the settings I'm using:

balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

balancer.Balancer: dfs.balancer.moverThreads = 12000 (default=1000)

balancer.Balancer: dfs.balancer.dispatcherThreads = 400 (default=200)

balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 32 (default=5)

balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)

balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 104857600 (default=10485760)

balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

balancer.Balancer: dfs.blocksize = 268435456 (default=134217728)

hdfs dfsadmin -setBalancerBandwidth 50000000


Here is some of the output for the balancer runs with the same settings.

***************************************************************************************************

May 25, 2019 6:00:19 AM 0 186.66 GB 137.33 TB 300 GB

May 25, 2019 6:05:10 AM 1 358.00 GB 137.27 TB 300 GB

May 25, 2019 6:09:00 AM 2 536.24 GB 137.10 TB 300 GB

***************************************************************************************************

May 27, 2019 4:01:46 AM 0 63.69 GB 112.29 TB 300 GB

May 27, 2019 4:05:33 AM 1 132.05 GB 112.28 TB 300 GB

May 27, 2019 4:09:24 AM 2 213.01 GB 112.25 TB 300 GB

***************************************************************************************************

May 28, 2019 4:35:39 PM 0 25.34 GB 95.48 TB 300 GB

May 28, 2019 4:39:34 PM 1 54.76 GB 95.46 TB 300 GB

May 28, 2019 4:43:23 PM 2 88.69 GB 95.45 TB 300 GB

***************************************************************************************************


Thanks,

Joey