Created on 08-19-2014 12:36 PM - edited 09-16-2022 02:05 AM
New hdfs balancer the command (tried via CM as well same result)
"hdfs balancer" command is slow to move data around to balance the nodes.
I have tried it with different thresholds and poicies.
The old hdfs balancer in version cdh4.1.3 is pretty fast. It actually moves around data at the speed specified by
dfs.balance.bandwidthPerSec. The old cluster moves data around rapidly, ie the balancer thread doesnt sleep.
The new one however, seems to either ignore it or is superceeded by another value that is not tunable. The balancer sleeps for 35s then moves a few blocks around, then sleeps again.
I have dfs.balance.bandwidthPerSec set to 1GB/s (or 10Gb)
Yet the cluster only gets balanced at 15MB/s which is really slow when you have TBs of data.
This cluster is completely idle of 18 nodes, nothing is happening other than prep work to move data around and testing to get the cluster to prod status.
If anyone knows the quirks of the new hadoop please help! At this snail's pace adding / remove nodes would be disasterous to the hadoop infrastructure. This would indeed become small data platform not big data.
Created 10-23-2014 11:06 AM
There was a patch released that fixed this problem.
I updated the bundle give to me by support.
Created 12-24-2014 03:17 AM
Created 01-16-2015 12:16 PM
We have created PATCH-434 for your s8cluster that is a backport of HDFS-6621 for CDH 5.1.0.
This patch was tested against CentOS 5.7 and CentOS 6+ which are our supported OS versions for CDH 5.1.0.
Created 01-17-2015 03:49 PM
@jakeri wrote:Hello,
We are also experiencing this. Could you point us to which patch this is?
Kind regards,
Jakob
The patch is part of CDH 5.1.4 and CDH 5.2.0 (and later versions). You can simply upgrade to v5.1.4 from v5.1.0 for the fix. If you have a support contract, you can log a case for a patch on a specific version of CDH.
 
					
				
				
			
		
