Created 05-28-2017 11:07 PM
Command ran through shell script:
....Logging sudo -u hdfs -b hdfs balancer -threshold 5 ....
Log: The Balance exits successfully without balancing.
17/05/26 16:38:51 INFO balancer.Balancer: Using a threshold of 5.0 17/05/26 16:38:51 INFO balancer.Balancer: namenodes = [hdfs://belongcluster1] 17/05/26 16:38:51 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false] 17/05/26 16:38:51 INFO balancer.Balancer: included nodes = [] 17/05/26 16:38:51 INFO balancer.Balancer: excluded nodes = [] 17/05/26 16:38:51 INFO balancer.Balancer: source nodes = [] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 17/05/26 16:38:53 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys 17/05/26 16:38:53 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 17/05/26 16:38:53 INFO block.BlockTokenSecretManager: Setting block keys 17/05/26 16:38:53 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 17/05/26 16:38:53 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728) 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YYY:50010 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.145.YY:50010 17/05/26 16:38:53 INFO net.NetworkTopology: Adding a new node: /default-rack/58.XXX.144.YY:50010 17/05/26 16:38:53 INFO balancer.Balancer: 0 over-utilized: [] 17/05/26 16:38:53 INFO balancer.Balancer: 0 underutilized: [] The cluster is balanced. Exiting... May 26, 2017 4:38:53 PM 0 0 B 0 B -1 B May 26, 2017 4:38:54 PM Balancing took 2.773 seconds
The Ambari Host view indicates that the data is still not balanced across the nodes:
( Updated) The cluster has HA configuration (Primary-Secondary).
Ambari : 2.2.1.0
Hadoop : 2.7.1.2.4.0.0-169
Any input will be helpfull.
Created 05-29-2017 05:36 AM
Command: tried it directly without pushing it to background sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5
[ayguha@dh01 ~]$ sudo -u hdfs hdfs balancer -fs hdfs://belongcluster1:8020 -threshold 5 17/05/29 15:29:39 INFO balancer.Balancer: Using a threshold of 5.0 17/05/29 15:29:39 INFO balancer.Balancer: namenodes = [hdfs://belongcluster1, hdfs://belongcluster1:8020] 17/05/29 15:29:39 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false] 17/05/29 15:29:39 INFO balancer.Balancer: included nodes = [] 17/05/29 15:29:39 INFO balancer.Balancer: excluded nodes = [] 17/05/29 15:29:39 INFO balancer.Balancer: source nodes = [] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 17/05/29 15:29:41 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 17/05/29 15:29:41 INFO block.BlockTokenSecretManager: Setting block keys 17/05/29 15:29:41 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys 17/05/29 15:29:42 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 17/05/29 15:29:42 INFO block.BlockTokenSecretManager: Setting block keys 17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... May 29, 2017 3:29:42 PM Balancing took 3.035 seconds
Error:
17/05/29 15:29:42 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ...
Also checked if balancer process is stuck.. from the output it does not look like anything is hanging from previous tries.
dh01 ~]$ ps -ef | grep "balancer" ayguha 4611 2551 0 15:34 pts/0 00:00:00 grep balancer dh01 ~]$hdfs dfs -ls /system/balancer.id ls: `/system/balancer.id': No such file or directory
Created 05-29-2017 05:48 AM
So here is your problem:
INFO balancer.Balancer: namenodes = [hdfs://belongcluster1, hdfs://belongcluster1:8020]
There should be only active namenode here. It's showing both. Do you have the following property in your configs (exactly this property):
"dfs.namenode.rpc-address"?
Created on 05-29-2017 06:17 AM - edited 08-18-2019 12:39 AM
@mqureshi The cluster currently only has one active name node.
Is there a better way to find out the 'Active Node' ? I used the following as well.. but does not distinguish
curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=...
dh01 ~]$ curl --user admin:admin http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=... [1] 16533 -bash: metrics/dfs/FSNamesystem/HAState=active: No such file or directory [ayguha@dh01 ~]$ { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/host_components?HostRoles/component_name=NAMENODE", "items" : [ { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au/host_components/NAMENODE", "HostRoles" : { "cluster_name" : "belong1", "component_name" : "NAMENODE", "host_name" : "dh01.int.belong.com.au" }, "host" : { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh01.int.belong.com.au" } }, { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au/host_components/NAMENODE", "HostRoles" : { "cluster_name" : "belong1", "component_name" : "NAMENODE", "host_name" : "dh02.int.belong.com.au" }, "host" : { "href" : "http://dh01.int.belong.com.au:8080/api/v1/clusters/belong1/hosts/dh02.int.belong.com.au" } } ] }
Also hdfs-site.xml does not have the property dfs.namenode.rpc-address.