Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Balancer exits without balancing

avatar
Contributor

Command ran through shell script:

....Logging
sudo -u hdfs -b hdfs balancer -threshold 5
....

Log: The Balance exits successfully without balancing.

17/05/26 16:38:51 INFO balancer.Balancer: Using a threshold of 5.0
17/05/26 16:38:51 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1]
17/05/26 16:38:51 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/26 16:38:51 INFO balancer.Balancer: included nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: excluded nodes = []
17/05/26 16:38:51 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/26 16:38:53 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys
17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
17/05/26 16:38:53 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010
17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YY:50010
17/05/26 16:38:53 INFO balancer.Balancer: 0 over-utilized: []
17/05/26 16:38:53 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
May 26, 2017 4:38:53 PM           0                  0 B                 0 B               -1 B
May 26, 2017 4:38:54 PM  Balancing took 2.773 seconds

The Ambari Host view indicates that the data is still not balanced across the nodes:

hostviewambari.jpg

( Updated) The cluster has HA configuration (Primary-Secondary).

Ambari : 2.2.1.0

Hadoop : 2.7.1.2.4.0.0-169

Any input will be helpfull.

12 REPLIES 12

avatar
Contributor

@mqureshi

Command: tried it directly without pushing it to background sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5

[ayguha@dh01 ~]$ sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5
17/05/29 15:29:39 INFO balancer.Balancer: Using a threshold of 5.0
17/05/29 15:29:39 INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://belongcluster1:8020]
17/05/29 15:29:39 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
17/05/29 15:29:39 INFO balancer.Balancer: included nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: excluded nodes = []
17/05/29 15:29:39 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
17/05/29 15:29:41 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:41 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:41 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys
17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...
May 29, 2017 3:29:42 PM  Balancing took 3.035 seconds

Error:

17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...

Also checked if balancer process is stuck.. from the output it does not look like anything is hanging from previous tries.

dh01 ~]$ ps -ef | grep "balancer"
ayguha    4611  2551  0 15:34 pts/0    00:00:00 grep balancer

dh01 ~]$hdfs dfs -ls /system/balancer.id
ls: `/system/balancer.id': No such file or directory

avatar
Super Guru

@Suhel

So here is your problem:

INFO balancer.Balancer: namenodes  = [hdfs://belongcluster1, hdfs://belongcluster1:8020]

There should be only active namenode here. It's showing both. Do you have the following property in your configs (exactly this property):

"dfs.namenode.rpc-address"?

avatar
Contributor

@mqureshi The cluster currently only has one active name node.

15843-active-namenodes.jpg

Is there a better way to find out the 'Active Node' ? I used the following as well.. but does not distinguish

curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=...
dh01 ~]$ curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=...
[1] 16533
-bash: metrics/dfs/FSNamesystem/HAState=active: No such file or directory
[ayguha@dh01 ~]$ {
  "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE",
  "items" : [
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh01.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au"
      }
    },
    {
      "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au/host_components/NAMENODE",
      "HostRoles" : {
        "cluster_name" : "belong1",
        "component_name" : "NAMENODE",
        "host_name" : "dh02.int.belong.com.au"
      },
      "host" : {
        "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au"
      }
    }
  ]
}



Also hdfs-site.xml does not have the property dfs.namenode.rpc-address.