Support Questions
Find answers, ask questions, and share your expertise

Issues with HDFS Rebalance

Expert Contributor

Hi,

I am having some issues with rebalancing my HDF cluster (runs 2.6). There's a node whose data directory is 100% full. I used the hdfs balancer in Ambari and also ran the balancer command <hdfs balancer>. I have not seen any changes to the server node ...

Please what's the way forward?

3 REPLIES 3

Re: Issues with HDFS Rebalance

Mentor

@Joshua Adeleke

Could you share the architecture of your cluster? Number of nodes (Masters) and (Datanodes)

Rebalancer uses 2 methods

Round-robin [default]: It distributes the new blocks in a uniform way across the available disks.

Available space: It writes data to the disk that has most free space (by percentage).

Modify the property below if it doesn't exist add to ensure your disk are never 100% full

dfs.datanode.du.reserved=xx (reserved space in bytes per volume). 

This will always leave the specified space free on all DataNode disks

Have you set this parameter in hdfs-site.xml dfs.disk.balancer.enabled=true Can you share the output of

$ hdfs dfsadmin -report 

Did you run the balancer with a threshold?

$ hdfs balancer -threshold -help 

output Expecting a number in the range of [1.0, 100.0]: -help now run

$ hdfs balancer -threshold 9.0 

This will keep the disks at 90 full

output

18/11/17 21:49:16 INFO balancer.Balancer: Using a threshold of 9.0
18/11/17 21:49:16 INFO balancer.Balancer: namenodes  = [hdfs://xxxxxx:8020]
18/11/17 21:49:16 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 9.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
18/11/17 21:49:16 INFO balancer.Balancer: included nodes = []
18/11/17 21:49:16 INFO balancer.Balancer: excluded nodes = []
18/11/17 21:49:16 INFO balancer.Balancer: source nodes = [] 

Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved

18/11/17 21:49:17 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys
18/11/17 21:49:18 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
18/11/17 21:49:18 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.0.31:50010
18/11/17 21:49:18 INFO balancer.Balancer: 0 over-utilized: []
18/11/17 21:49:18 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Nov 17, 2018 9:49:18 PM           0                  0 B                 0 B                0 B
Nov 17, 2018 9:49:18 PM  Balancing took 2.981 seconds

Revalidate depending on the data size it could take pretty long

Re: Issues with HDFS Rebalance

Expert Contributor

@Geoffrey Shelton Okot

Thanks for your response. My cluster has 11 nodes (3 master and 8 worker nodes). yes, i ran the balancer with a threshold of 5. I see it's still running from Friday morning...

My Datanode:
/dev/sdb              5.4T  5.1T   17M 100% /grid/1
/dev/sdc              5.4T  5.1T  263M 100% /grid/2
/dev/sdd              5.4T  5.1T  912M 100% /grid/3
/dev/sde              5.4T  5.1T  283M 100% /grid/4
/dev/sdf              5.4T  5.1T   95M 100% /grid/5
/dev/sdg              5.4T  5.1T  388M 100% /grid/6
/dev/sdh              5.4T  5.1T   22G 100% /grid/7
/dev/sdi              5.4T  5.1T  694M 100% /grid/8
/dev/sdj              5.4T  5.1T  843M 100% /grid/9
/dev/sdk              5.4T  5.1T   36M 100% /grid/10
/dev/sdl              5.4T  5.1T  120M 100% /grid/11
/dev/sda              5.4T  5.1T  802M 100% /grid/0
tail of balancer output log:
18/11/19 12:12:02 INFO balancer.Dispatcher: Successfully moved blk_1107025919_33285238 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:02 INFO balancer.Dispatcher: Start moving blk_1107022998_33282317 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:07 INFO balancer.Dispatcher: Successfully moved blk_1107025997_33285316 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:07 INFO balancer.Dispatcher: Start moving blk_1107022634_33281953 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodej:50010

Re: Issues with HDFS Rebalance

Mentor

@Joshua Adeleke
You didn't update this thread if the answer resolved your issue can you accept it to close the thread