Support Questions

Find answers, ask questions, and share your expertise

HDFS Balancer exits without balancing

avatar
Contributor

Command ran through shell script:

....Logging
sudo -u hdfs -b hdfs balancer -threshold 5
....

Log: The Balance exits successfully without balancing.

17/05/26 16:38:51 INFO balancer.Balancer: Using a threshold of 5.0
17/05/26 16:38:51 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1]
17/05/26 16:38:51 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/26 16:38:51 INFO balancer.Balancer: included nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: excluded nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/26 16:38:53 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YY:50010
17/05/26 16:38:53 INFO balancer.Balancer: 0 over-utilized: []
17/05/26 16:38:53 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
May 26, 2017 4:38:53 PM           0                  0 B                 0 B               -1 B
May 26, 2017 4:38:54 PM  Balancing took 2.773 seconds

The Ambari Host view indicates that the data is still not balanced across the nodes:

hostviewambari.jpg

( Updated) The cluster has HA configuration (Primary-Secondary).

Ambari : 2.2.1.0

Hadoop : 2.7.1.2.4.0.0-169

Any input will be helpfull.

12 REPLIES 12

avatar
Contributor

@mqureshi

Command: tried it directly without pushing it to background sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5

[ayguha@dh01 ~]$ sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5
17/05/29 15:29:39 INFO balancer.Balancer: Using a threshold of 5.0
17/05/29 15:29:39 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://belongcluster1:8020]
17/05/29 15:29:39 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/29 15:29:39 INFO balancer.Balancer: included nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: excluded nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/29 15:29:41 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:41 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:41 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...
May 29, 2017 3:29:42 PM  Balancing took 3.035 seconds

Error:

17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...

Also checked if balancer process is stuck.. from the output it does not look like anything is hanging from previous tries.

dh01 ~]$ ps -ef | grep "balancer"
ayguha    4611  2551  0 15:34 pts/0    00:00:00 grep balancer

dh01 ~]$hdfs dfs -ls /system/balancer.id
ls: `/system/balancer.id': No such file or directory

avatar
Super Guru

@Suhel

So here is your problem:

INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://belongcluster1:8020]

There should be only active namenode here. It's showing both. Do you have the following property in your configs (exactly this property):

"dfs.namenode.rpc-address"?

avatar
Contributor

@mqureshi The cluster currently only has one active name node.

15843-active-namenodes.jpg

Is there a better way to find out the 'Active Node' ? I used the following as well.. but does not distinguish

curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=...
dh01 ~]$ curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=...
[1] 16533
-bash: metrics/dfs/FSNamesystem/HAState=active: No such file or directory
[ayguha@dh01 ~]$ {
  "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE",
  "items" : [
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh01.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au"
      }
    },
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh02.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au"
      }
    }
  ]
}



Also hdfs-site.xml does not have the property dfs.namenode.rpc-address.