Created on 01-03-2019 01:16 PM - edited 08-17-2019 03:35 PM
hi all
We have huge cluster with the following machines
3 kafka machines
3 journal node machine ( master machines )
180 data node machines
we have a problem that disks in mostof the data-node are not have the same equal used size
example:
/dev/sdf 3.6T 442G 3.2T 13% /grid/sdf /dev/sdc 3.6T 373G 3.3T 11% /grid/sdc /dev/sde 3.6T 480G 3.2T 14% /grid/sde /dev/sdi 3.6T 89M 3.6T 1% /grid/sdi /dev/sdg 3.6T 89M 3.6T 1% /grid/sdg /dev/sdd 3.6T 477G 3.2T 13% /grid/sdd /dev/sdh 3.6T 89M 3.6T 1% /grid/sdh /dev/sdb 3.6T 480G 3.2T 14% /grid/sdb
so we re-balanced the HDFS from the ambari GUI as the following
HDFS --> SERVICE Actions --> Re-balance HDFS
but after that we seen that all disks on all workers are the same used size
so re balanced not performed here
so we not understand if the re-balanced button should works and if not , then what could be the reasons ?
is it a bug? , or something in the HDFS configuration that need to verify ?
Created 01-06-2019 05:48 PM
In HDFS 2.x provides a “balancer” utility to help balance the blocks across DataNodes in the cluster. But from HDFS 3.x onwards we have Disk level Balancer that rebalance data across multiple disks of a DataNode. It is useful to correct skewed data distribution often seen after adding or replacing disks. Disk Balancer can be enabled by setting dfs.disk.balancer.enabled to true in hdfs-site.xml. It can be invoked by running "hdfs diskbalancer”.
JIRA: https://issues.apache.org/jira/browse/HDFS-1312
For more detail: https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html
Please accept this answer if you found it helpful.
Created 01-06-2019 09:03 PM
we cant for now upgrade the cluster to 3.0 , so for the current version - 2.6.4 , so how to re-balance the HDFS ? ( you said - “balancer” utility , how to use it and which tool is it ? / location ? )
Created 01-06-2019 09:22 PM
The balancer is a subcommand of hdfs see usage below
hdfs balancer [-threshold <threshold>] [-policy <policy>] [-exclude [-f <hosts-file> | <comma-separated list of hosts>]] [-include [-f <hosts-file> | <comma-separated list of hosts>]] [-idleiterations <idleiterations>]
Here is the link to the documentation
Created 01-06-2019 09:34 PM
ok . let me some time to play with this
second can you help me with my last thred - https://community.hortonworks.com/questions/231336/kafka-broker-does-not-restart.html
Created 01-14-2019 08:51 PM
The balancer command is working between nodes, but is there any command to balance space between partitions in the same node ?
We were facing an issue with a partition was read only and once it was fixed the space was increasing but not balanced