Created 11-16-2018 11:30 AM
Hi,
I am having some issues with rebalancing my HDF cluster (runs 2.6). There's a node whose data directory is 100% full. I used the hdfs balancer in Ambari and also ran the balancer command <hdfs balancer>. I have not seen any changes to the server node ...
Please what's the way forward?
Created 11-17-2018 09:33 PM
Could you share the architecture of your cluster? Number of nodes (Masters) and (Datanodes)
Rebalancer uses 2 methods
Round-robin [default]: It distributes the new blocks in a uniform way across the available disks.
Available space: It writes data to the disk that has most free space (by percentage).
Modify the property below if it doesn't exist add to ensure your disk are never 100% full
dfs.datanode.du.reserved=xx (reserved space in bytes per volume).
This will always leave the specified space free on all DataNode disks
Have you set this parameter in hdfs-site.xml dfs.disk.balancer.enabled=true Can you share the output of
$ hdfs dfsadmin -report
Did you run the balancer with a threshold?
$ hdfs balancer -threshold -help
output Expecting a number in the range of [1.0, 100.0]: -help now run
$ hdfs balancer -threshold 9.0
This will keep the disks at 90 full
output
18/11/17 21:49:16 INFO balancer.Balancer: Using a threshold of 9.0 18/11/17 21:49:16 INFO balancer.Balancer: namenodes = [hdfs://xxxxxx:8020] 18/11/17 21:49:16 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 9.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false] 18/11/17 21:49:16 INFO balancer.Balancer: included nodes = [] 18/11/17 21:49:16 INFO balancer.Balancer: excluded nodes = [] 18/11/17 21:49:16 INFO balancer.Balancer: source nodes = []
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
18/11/17 21:49:17 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys 18/11/17 21:49:18 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys 18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 18/11/17 21:49:18 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728) 18/11/17 21:49:18 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.0.31:50010 18/11/17 21:49:18 INFO balancer.Balancer: 0 over-utilized: [] 18/11/17 21:49:18 INFO balancer.Balancer: 0 underutilized: [] The cluster is balanced. Exiting... Nov 17, 2018 9:49:18 PM 0 0 B 0 B 0 B Nov 17, 2018 9:49:18 PM Balancing took 2.981 seconds
Revalidate depending on the data size it could take pretty long
Created 11-19-2018 01:04 PM
Thanks for your response. My cluster has 11 nodes (3 master and 8 worker nodes). yes, i ran the balancer with a threshold of 5. I see it's still running from Friday morning...
My Datanode: /dev/sdb 5.4T 5.1T 17M 100% /grid/1 /dev/sdc 5.4T 5.1T 263M 100% /grid/2 /dev/sdd 5.4T 5.1T 912M 100% /grid/3 /dev/sde 5.4T 5.1T 283M 100% /grid/4 /dev/sdf 5.4T 5.1T 95M 100% /grid/5 /dev/sdg 5.4T 5.1T 388M 100% /grid/6 /dev/sdh 5.4T 5.1T 22G 100% /grid/7 /dev/sdi 5.4T 5.1T 694M 100% /grid/8 /dev/sdj 5.4T 5.1T 843M 100% /grid/9 /dev/sdk 5.4T 5.1T 36M 100% /grid/10 /dev/sdl 5.4T 5.1T 120M 100% /grid/11 /dev/sda 5.4T 5.1T 802M 100% /grid/0
tail of balancer output log: 18/11/19 12:12:02 INFO balancer.Dispatcher: Successfully moved blk_1107025919_33285238 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010 18/11/19 12:12:02 INFO balancer.Dispatcher: Start moving blk_1107022998_33282317 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010 18/11/19 12:12:07 INFO balancer.Dispatcher: Successfully moved blk_1107025997_33285316 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010 18/11/19 12:12:07 INFO balancer.Dispatcher: Start moving blk_1107022634_33281953 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodej:50010
Created 01-06-2019 09:21 PM
@Joshua Adeleke
You didn't update this thread if the answer resolved your issue can you accept it to close the thread