It could actually be running and just not aggressively enough.
I've discovered the median size of our HDFS files is something like 6KB, so the balancer is fairly inefficient, since its execution time is the more or less the same for each block, given a fairly fast network.
I'm going to have to scedule the balancer to run from cron once a day or so.
So my queston is slightly modified: How can I tell when the balancer runs when the Balancer service is configured? There do not seem to be any parameters related to scheduling it.
One related question: How does the balancer choose which blocks to move? Does it favor small files over large ones? The reason is because I used the output of the balancer ("moved block blah with size=..."), which includes the size of each block, as a sample of my file sizes. We actually have a slighty less than 1 to 1 blocks-to-file ratio, and of the 32000 files I sampled form the balancer run, only 2000 or were "full" blocks of 64MB.