Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Issues with HDFS Rebalance

Highlighted

Issues with HDFS Rebalance

Expert Contributor

Hi,

I am having some issues with rebalancing my HDF cluster (runs 2.6). There's a node whose data directory is 100% full. I used the hdfs balancer in Ambari and also ran the balancer command <hdfs balancer>. I have not seen any changes to the server node ...

Please what's the way forward?

3 REPLIES 3

Re: Issues with HDFS Rebalance

Mentor

@Joshua Adeleke

Could you share the architecture of your cluster? Number of nodes (Masters) and (Datanodes)

Rebalancer uses 2 methods

Round-robin [default]: It distributes the new blocks in a uniform way across the available disks.

Available space: It writes data to the disk that has most free space (by percentage).

Modify the property below if it doesn't exist add to ensure your disk are never 100% full

dfs.datanode.du.reserved=xx (reserved space in bytes per volume). 

This will always leave the specified space free on all DataNode disks

Have you set this parameter in hdfs-site.xml dfs.disk.balancer.enabled=true Can you share the output of

$ hdfs dfsadmin -report 

Did you run the balancer with a threshold?

$ hdfs balancer -threshold -help 

output Expecting a number in the range of [1.0, 100.0]: -help now run

$ hdfs balancer -threshold 9.0 

This will keep the disks at 90 full

output

18/11/17 21:49:16 INFO balancer.Balancer: Using a threshold of 9.0
18/11/17 21:49:16 INFO balancer.Balancer: namenodes  = [hdfs://xxxxxx:8020]
18/11/17 21:49:16 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 9.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
18/11/17 21:49:16 INFO balancer.Balancer: included nodes = []
18/11/17 21:49:16 INFO balancer.Balancer: excluded nodes = []
18/11/17 21:49:16 INFO balancer.Balancer: source nodes = [] 

Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved

18/11/17 21:49:17 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys
18/11/17 21:49:18 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
18/11/17 21:49:18 INFO block.BlockTokenSecretManager: Setting block keys
18/11/17 21:49:18 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
18/11/17 21:49:18 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
18/11/17 21:49:18 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.0.31:50010
18/11/17 21:49:18 INFO balancer.Balancer: 0 over-utilized: []
18/11/17 21:49:18 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
Nov 17, 2018 9:49:18 PM           0                  0 B                 0 B                0 B
Nov 17, 2018 9:49:18 PM  Balancing took 2.981 seconds

Revalidate depending on the data size it could take pretty long

Re: Issues with HDFS Rebalance

Expert Contributor

@Geoffrey Shelton Okot

Thanks for your response. My cluster has 11 nodes (3 master and 8 worker nodes). yes, i ran the balancer with a threshold of 5. I see it's still running from Friday morning...

My Datanode:
/dev/sdb              5.4T  5.1T   17M 100% /grid/1
/dev/sdc              5.4T  5.1T  263M 100% /grid/2
/dev/sdd              5.4T  5.1T  912M 100% /grid/3
/dev/sde              5.4T  5.1T  283M 100% /grid/4
/dev/sdf              5.4T  5.1T   95M 100% /grid/5
/dev/sdg              5.4T  5.1T  388M 100% /grid/6
/dev/sdh              5.4T  5.1T   22G 100% /grid/7
/dev/sdi              5.4T  5.1T  694M 100% /grid/8
/dev/sdj              5.4T  5.1T  843M 100% /grid/9
/dev/sdk              5.4T  5.1T   36M 100% /grid/10
/dev/sdl              5.4T  5.1T  120M 100% /grid/11
/dev/sda              5.4T  5.1T  802M 100% /grid/0
tail of balancer output log:
18/11/19 12:12:02 INFO balancer.Dispatcher: Successfully moved blk_1107025919_33285238 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:02 INFO balancer.Dispatcher: Start moving blk_1107022998_33282317 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:07 INFO balancer.Dispatcher: Successfully moved blk_1107025997_33285316 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodeg:50010
18/11/19 12:12:07 INFO balancer.Dispatcher: Start moving blk_1107022634_33281953 with size=134217728 from nodeg:50010:DISK to nodeh:50010:DISK through nodej:50010

Re: Issues with HDFS Rebalance

Mentor

@Joshua Adeleke
You didn't update this thread if the answer resolved your issue can you accept it to close the thread