Support Questions

Find answers, ask questions, and share your expertise

hdfs balancer slow to move data around in cdh 5

avatar
Explorer

New hdfs balancer the command (tried via CM as well same result)

"hdfs balancer" command is slow to move data around to balance the nodes.

I have tried it with different thresholds and poicies.

 

The old hdfs balancer in version cdh4.1.3 is pretty fast.  It actually moves around data at the speed specified by

dfs.balance.bandwidthPerSec.   The old cluster moves data around rapidly, ie the balancer thread doesnt sleep.

 

The new one however, seems to either ignore it or is superceeded by another value that is not tunable.  The balancer sleeps for 35s then moves a few blocks around, then sleeps again.

I have dfs.balance.bandwidthPerSec set to 1GB/s (or 10Gb)

Yet the cluster only gets balanced at 15MB/s  which is really slow when you have TBs of data.

This cluster is completely idle of 18 nodes, nothing is happening other than prep work to move data around and testing to get the cluster to prod status.

 

If anyone knows the quirks of the new hadoop please help!   At this snail's pace adding / remove nodes would be disasterous to the hadoop infrastructure.   This would indeed become small data platform not big data.

 

 

2 ACCEPTED SOLUTIONS

avatar
Explorer

There was a patch released that fixed this problem.

I updated the bundle give to me by support.

 

 

View solution in original post

avatar
The patch is part of CDH 5.1.4 and CDH 5.2.0 (and later versions). You can simply upgrade to v5.1.4 from v5.1.0 for the fix.
Regards,
Gautam Gopalakrishnan

View solution in original post

11 REPLIES 11

avatar
Expert Contributor
More than likely you are hitting HDFS-6621. This is fairly new so it hasn't been fixed in CDH5 as of yet.

avatar
Explorer

The issue sounds close, but issue I am seeing that the thread sleeps for 35s and moves more data.

It doesnt exit out.   It continues to run, but sleeps 35s then moves, then sleeps, then moves.

I gathered total rate to be about 15MB/s doing this it this way.

 

like so:

 

14/08/14 14:22:44 INFO balancer.Balancer: Successfully moved blk_1075451687_1713835 with size=132482663 from 10.2.2.3:50010 to 10.2.1.247:50010 through 10.2.1.248:50010
14/08/14 14:23:19 INFO net.NetworkTopology: Adding a new node: /default/10.2.1.253:50010

14/08/14 14:23:22 INFO balancer.Balancer: Successfully moved blk_1076792220_3054863 with size=134217728 from 10.2.1.253:50010 to 10.2.1.247:50010 through 10.2.2.7:50010
14/08/14 14:23:56 INFO net.NetworkTopology: Adding a new node: /default/10.2.2.2:50010

 

You can see the last move and next time it wakesup todo the move again its about 35s every time!

 

avatar
Sorry for the late response. What exact version of CDH 5.x are you running?
Regards,
Gautam Gopalakrishnan

avatar
Explorer

There was a patch released that fixed this problem.

I updated the bundle give to me by support.

 

 

avatar
That's great news, thanks for the feedback
Regards,
Gautam Gopalakrishnan

avatar
New Contributor

it seems i have the same issue exactly with CDH 5.1.0

where can i get the patch you are talking about?

 

Thanks,

Eliran

avatar
The patch is part of CDH 5.1.4 and CDH 5.2.0 (and later versions). You can simply upgrade to v5.1.4 from v5.1.0 for the fix.
Regards,
Gautam Gopalakrishnan

avatar
New Contributor

god bless you!!!

 

 

i have been rebalancing fro almost two month and its been on 40%, and now in and hour it finished almost 10%

we were going crazy here.

 

Thanks alot

avatar
New Contributor

Hello,

 

We are also experiencing this. Could you point us to which patch this is?

 

Kind regards,

Jakob