Support Questions

Find answers, ask questions, and share your expertise

HDFS Balancer exits without balancing


Command ran through shell script:

sudo -u hdfs -b hdfs balancer -threshold 5

Log: The Balance exits successfully without balancing.

17/05/26 16:38:51 INFO balancer.Balancer: Using a threshold of 5.0
17/05/26 16:38:51 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1]
17/05/26 16:38:51 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/26 16:38:51 INFO balancer.Balancer: included nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: excluded nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/26 16:38:53 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YY:50010
17/05/26 16:38:53 INFO balancer.Balancer: 0 over-utilized: []
17/05/26 16:38:53 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
May 26, 2017 4:38:53 PM           0                  0 B                 0 B               -1 B
May 26, 2017 4:38:54 PM  Balancing took 2.773 seconds

The Ambari Host view indicates that the data is still not balanced across the nodes:


( Updated) The cluster has HA configuration (Primary-Secondary).

Ambari :

Hadoop :

Any input will be helpfull.


Super Guru
  1. sudo -u hdfs -b hdfs balancer -threshold 5

What do you have "-b" for in this command? Shouldn't this be

sudo -u hdfs hdfs balancer -threshold 5


@mqureshi I used "-b" option to push the processing to background. I have also tried the following from server that has NN. Trial 1: (on Command Prompt)

nohup sudo -u hdfs hdfs balancer -threshold 5 > /var/log/hadoop/hdfs/balancer.$(date +%F_%H-%M-%S.%N).log 2>&1 &
Trial 2: (on Command Prompt) . DH05 needs to be offloaded as its the most unbalanced
sudo -u hdfs -b hdfs balancer -threshold 5 -source DH05 > /var/log/hadoop/hdfs/balancer.$(date +%F_%H-%M-%S.%N).log 2>&1 &

I get the same output from Balancer as it exists stating that "The cluster is balanced". It's somehow not able to get the current stats on data in the datanodes.

Super Guru


Do you also have a standby namenode? Can you try the following:

sudo -u hdfs -b hdfs balancer -fs hdfs://<your name node>:8020 -threshold 5


@mqureshi The cluster has a primary and secondary configuration for NN. When i run the balance command as you indicated, i get an error stating "Another Balancer is running". But ps -ef | grep balancer does not show any running balancer process

[root@dh01 ~]# sudo -u hdfs hdfs balancer -fs hdfs:// -threshold 5
17/05/29 12:14:53 INFO balancer.Balancer: Using a threshold of 5.0
17/05/29 12:14:53 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://]
17/05/29 12:14:53 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/29 12:14:53 INFO balancer.Balancer: included nodes = []
17/05/29 12:14:53 INFO balancer.Balancer: excluded nodes = []
17/05/29 12:14:53 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/29 12:14:54 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 12:14:54 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 12:14:54 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/29 12:14:55 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 12:14:55 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 12:14:55 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 12:14:55 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec Another Balancer is running..  Exiting ...
May 29, 2017 12:14:55 PM Balancing took 2.431 seconds

Super Guru



About :

my hdfs-site.xml has 2 entries .. i am not sure if i need to delete both or NN2 only..


Super Guru


I think you should have only one value and it should point to your "name service". You should have a value for name service when you have HA enabled. See the following link on how this works in HA - third row:


@mqureshi I found another thread with similar issue: here they say indicate that if HA is enabled then one would need to remove dfs.namenode.rpc-address . I ran a check on Ambari Server using the

/var/lib/ambari-server/resources/scripts/ -u admin -p admin -port 8080 get belong1 hdfs-site

and the output does not contain the dfs.namenode.rpc-address property.

########## Performing 'GET' on (Site:hdfs-site, Tag:version1470359698835)
"properties" : {
"dfs.block.access.token.enable" : "true",
"dfs.blockreport.initialDelay" : "120",
"dfs.blocksize" : "134217728",
"dfs.client.block.write.replace-datanode-on-failure.enable" : "NEVER",
"dfs.client.failover.proxy.provider.belongcluster1" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"" : "true",
"" : "4096",
"dfs.client.retry.policy.enabled" : "false",
"dfs.cluster.administrators" : " hdfs",
"dfs.content-summary.limit" : "5000",
"dfs.datanode.address" : "",
"dfs.datanode.balance.bandwidthPerSec" : "6250000",
"" : "/data/hadoop/hdfs/data",
"" : "750",
"dfs.datanode.du.reserved" : "1073741824",
"dfs.datanode.failed.volumes.tolerated" : "0",
"dfs.datanode.http.address" : "",
"dfs.datanode.https.address" : "",
"dfs.datanode.ipc.address" : "",
"dfs.datanode.max.transfer.threads" : "16384",
"dfs.domain.socket.path" : "/var/lib/hadoop-hdfs/dn_socket",
"" : "AES/CTR/NoPadding",
"dfs.encryption.key.provider.uri" : "",
"dfs.ha.automatic-failover.enabled" : "true",
"dfs.ha.fencing.methods" : "shell(/bin/true)",
"dfs.ha.namenodes.belongcluster1" : "nn1,nn2",
"dfs.heartbeat.interval" : "3",
"dfs.hosts.exclude" : "/etc/hadoop/conf/dfs.exclude",
"dfs.http.policy" : "HTTP_ONLY",
"dfs.https.port" : "50470",
"dfs.journalnode.edits.dir" : "/hadoop/hdfs/journal",
"dfs.journalnode.https-address" : "",
"dfs.namenode.accesstime.precision" : "0",
"dfs.namenode.acls.enabled" : "true",
"dfs.namenode.audit.log.async" : "true",
"" : "true",
"dfs.namenode.avoid.write.stale.datanode" : "true",
"dfs.namenode.checkpoint.dir" : "/tmp/hadoop/hdfs/namesecondary",
"dfs.namenode.checkpoint.edits.dir" : "${dfs.namenode.checkpoint.dir}",
"dfs.namenode.checkpoint.period" : "21600",
"dfs.namenode.checkpoint.txns" : "1000000",
"dfs.namenode.fslock.fair" : "false",
"dfs.namenode.handler.count" : "200",
"dfs.namenode.http-address" : "",
"dfs.namenode.http-address.belongcluster1.nn1" : "",
"dfs.namenode.http-address.belongcluster1.nn2" : "",
"dfs.namenode.https-address" : "",
"dfs.namenode.https-address.belongcluster1.nn1" : "",
"dfs.namenode.https-address.belongcluster1.nn2" : "",
"" : "/data/hadoop/hdfs/namenode",
"" : "true",
"dfs.namenode.rpc-address.belongcluster1.nn1" : "",
"dfs.namenode.rpc-address.belongcluster1.nn2" : "",
"dfs.namenode.safemode.threshold-pct" : "0.99",
"dfs.namenode.shared.edits.dir" : "qjournal://;;",
"dfs.namenode.stale.datanode.interval" : "30000",
"dfs.namenode.startup.delay.block.deletion.sec" : "3600",
"dfs.namenode.write.stale.datanode.ratio" : "1.0f",
"dfs.nameservices" : "belongcluster1",
"dfs.permissions.enabled" : "true",
"dfs.permissions.superusergroup" : "hdfs",
"dfs.replication" : "3",
"dfs.replication.max" : "50",
"" : "true",
"dfs.webhdfs.enabled" : "true",
"fs.permissions.umask-mode" : "022",
"nfs.exports.allowed.hosts" : "* rw",
"nfs.file.dump.dir" : "/tmp/.hdfs-nfs"

Are you suggesting that i just keep 1 namenode service address and point it to primary name node host:port. Something like the below:


Super Guru

Actually, don't delete anything. Your version of Ambari does not seem to be affected by this bug. Try the following:

sudo -u hdfs -b hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5

My guess is you were only missing port number. Can you please try it.