Support Questions
Find answers, ask questions, and share your expertise

Rack Awareness best practices for XXL cluster (700 Plus nodes)


We have a 700 plus node setup with Rack awareness set. The total number of racks are ~40 (each rack name set to that of the physical rack) with a replication factor of 3. I am keen to find out what observations other users have had with respect to the HDFS balancer. Our setup has a number of under and over utilized nodes (Mean 80/Min 20/Max 90) and the balancer keeps on running forever. A number of params have been used to tune the balancer but no improvements have been observed.


Command sample :  hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=20000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.balance.bandwidthPerSec=200000000 -Ddfs.balancer.getBlocks.size=1000000000000 -Ddfs.datanode.balance.max.concurrent.moves=32 -Ddfs.balancer.getBlocks.min-block-size=536870912 -threshold 20  


In my observation, the bytes being moved does  not neccesarily reflect the progress, also the bytes left to move sometimes increases rather than going down.


Env : CDH 5.5.1 (Centos 6)

Data size : ~30 PB


Super Collaborator

Hi @Hobster do your nodes well distributed cross the racks? does all the nodes with the same storage? which DataNode Volume Choosing Policy you are using? how many nodes with 20%?


Hi @Fawze. Your line of thought is spot on. Looking at our configs it does call for some changes to be made.


The details are as follows;


Datanode volume choosing policy = roundRobin ( Reckon available space would be best?)


Do all nodes have same storagre -  Actually we have 3 HDD templates, some have 8, some 10 and some have 12 disks with the capacity of each individual HDD at ~10 TB.

( This might be throwing off the balancer. I wonder whether the Hadoop framework looks at individual disk capacity for balancing rather than the node as a whole unit of storage?)


Nodes under 20% = Approx 100 ( They've been around that even after running the balancer continuously for weeks)


Here's the node distribution -


Rack name                nodes


/dc2/dc2-rack-126-12 37
/dc2/dc2-rack-127-09 38
/dc2/dc2-rack-127-10 38
/dc2/dc2-rack-127-11 38
/dc2/dc2-rack-127.02 1
/dc2/dc2-rack-128-12 37
/dc2/dc2-rack-129-02 18
/dc2/dc2-rack-129-03 19
/dc2/dc2-rack-129-04 20
/dc2/dc2-rack-129-05 20
/dc2/dc2-rack-129-06 18
/dc2/dc2-rack-129-07 40
/dc2/dc2-rack-129-08 20
/dc2/dc2-rack-129-09 19
/dc2/dc2-rack-CA50 36
/dc2/dc2-rack-CF9 22
/dc2/dc2-rack-CG9 21
/dc2/dc2-rack-CU34 46
/dc2/dc2-rack-CU37 47
/dc2/dc2-rack-CU50 37
/dc2/dc2-rack-CV50 36
/dc2/dc2-rack-CW34 46
/dc2/dc2-rack-CW37 45
/dc2/dc2-rack-CW50 36
/dc2/dc2-rack-CX50 36
/dc2/dc2-rack-CY34 46
/dc2/dc2-rack-CY37 46
/dc2/dc2-rack-CY45 12
/dc2/dc2-rack-CY50 36
/dc2/dc2-rack-CZ50 36
/dc2/dc2-rack-DF91 1
/dc2/dc2-rack-r126c6 3


Let me know you thoughts and thank you so much for chipping in 🙂






Super Collaborator
Yes, you should move to available space.

Regarding the rebalance, i think since CDh 5.8 you can issue rebalance
inside the node between the disks.

I can see some racks has 1 or few nodes, also your servers are not well
balanced between racks somes has 20 other 41, and almost sure your issue
came from this.


Sounds good @Fawze. We're planning on upgrading to CDH 5.16.1 (thank you for your views on that one too 🙂 and I will check with out DC team if they can balance the racks in order to reflect the same number of hosts and I'll move the the reccommended policy. 

Thank you so much!