Some of our data nodes have only half the disk capacity as other nodes, but HDFS balancing seems to put the same absolute amount of data on every node, packing the nodes with lower capacities near full usage, whereas the bigger capacity nodes would have enought spare.
Is there any possibility on hdfs configuration to distribute data on the nodes on a relative capacity usage?
Thanks in advance Cheers
You can use hdfs balancer
|-exclude -f <hosts-file> | <comma-separated list of hosts>||Excludes the specified datanodes from being balanced by the balancer.|
|-include -f <hosts-file> | <comma-separated list of hosts>||Includes only the specified datanodes to be balanced by the balancer.|
Hope this helps you.