Reply
Highlighted
Explorer
Posts: 18
Registered: ‎01-14-2015
Accepted Solution

HDFS Balancer isn't running by itself

The Balancer isn't running on its own. I have run run it manually from the command line.

 

 

Some of my HDFS nodes got to 97% full, whle others were only 30-something% full.

 

It works fine from the command line.

 

How can I check to see why it is not running on its own? As I understand it, it is supposed to do so.

 

Thanks,

 

w

Explorer
Posts: 18
Registered: ‎01-14-2015

Re: HDFS Balancer isn't running by itself

[ Edited ]

It could actually be running and just not aggressively enough.

 

I've discovered the median size of our HDFS files is something like 6KB, so the balancer is fairly inefficient, since its execution time is the more or less the same for each block, given a fairly fast network.

 

I'm going to have to scedule the balancer to run from cron once a day or so. 

 

So my queston is slightly modified: How can I tell when the balancer runs when the Balancer service is configured? There do not seem to be any parameters related to scheduling it.

 

One related question: How does the balancer choose which blocks to move? Does it favor small files over large ones? The reason is because I used the output of the balancer ("moved block blah with size=..."), which includes the size of each block, as a sample of my file sizes. We actually have a slighty less than 1 to 1 blocks-to-file ratio, and of the 32000 files I sampled form the balancer run, only 2000 or were "full" blocks of 64MB. 

Explorer
Posts: 18
Registered: ‎01-14-2015

Re: HDFS Balancer isn't running by itself

I also bumped up the bandwith with dfsadmin (from 10 to 40Mb/sec) and the next run was quite effective.
Cloudera Employee
Posts: 509
Registered: ‎07-30-2013

Re: HDFS Balancer isn't running by itself

CM does not run the balancer automatically. It must be run manually, or you can set up a cron job to do it.

As for how exactly the balancer works, you can ask on the CDH forums for better expertise, or check documentation.

Thanks,
Darren
New Contributor
Posts: 3
Registered: ‎04-28-2015

Re: HDFS Balancer isn't running by itself

Hi,

 

Do you mean to say the role Balancer in hdfs doesnt work balancing the data?

 

Do we have to run balancer from backend for sure?

 

is there any other way?

 

Thanks in advance

Shashank

ujj
Explorer
Posts: 7
Registered: ‎05-16-2016

Re: HDFS Balancer isn't running by itself

Is there an API route to run the HDFS balancer? I can't find it under the role commands.

Cloudera Employee
Posts: 509
Registered: ‎07-30-2013

Re: HDFS Balancer isn't running by itself

The balancer role just tells CM which host the balancer command should be executed from.

Running it via API is a bit weird. You start the role to run the balancer.
Explorer
Posts: 7
Registered: ‎11-25-2014

Re: HDFS Balancer isn't running by itself

Slight hijack of the thread but how does one change the balancer role to another host?

I wish to decommission the one that it is currently assigned to.

Announcements