Posts: 28
Registered: ‎09-20-2017

diskbalancer as a background job while on production

[ Edited ]



I am managing a CDH 5.13 cluster with 4 datanodes. Each datanode had 10 x 2.7 TB disks (~90% used) and we just added another 8 x 3.6 TB disks on each node.


I did a "Rebalance" on HDFS service, which apparently did nothing as all nodes have the same disk usage (in total).


Now, I followed this post in order to intra-node-balance the disks (with threshold set to 25). After 1 hour of execution, the progress is terribly slow (as you can see in the last /data/18 node which gets data):



$ sudo df -h
/dev/sdc1                    2.8T  2.4T  361G  88% /data/02
/dev/sdk1                    2.8T  2.5T  315G  89% /data/10
/dev/sdg1                    2.8T  2.5T  308G  89% /data/07
/dev/sdi1                    2.8T  2.5T  314G  89% /data/08
/dev/sdj1                    2.8T  2.5T  300G  90% /data/09
/dev/sde1                    2.8T  2.5T  299G  90% /data/04
/dev/sdf1                    2.8T  2.5T  303G  90% /data/06
/dev/sdh1                    2.8T  2.4T  353G  88% /data/05
/dev/sdb1                    2.8T  806G  2.0T  29% /data/01
/dev/sdd1                    2.8T  2.5T  298G  90% /data/03
---#NEW DISKS#---
/dev/sdl1                    3.7T   35M  3.7T   1% /data/11
/dev/sdm1                    3.7T   36M  3.7T   1% /data/12
/dev/sdn1                    3.7T   34M  3.7T   1% /data/13
/dev/sdo1                    3.7T   35M  3.7T   1% /data/14
/dev/sdp1                    3.7T   34M  3.7T   1% /data/15
/dev/sdq1                    3.7T   34M  3.7T   1% /data/16
/dev/sdr1                    3.7T   34M  3.7T   1% /data/17
/dev/sds1                    3.7T   26G  3.7T   1% /data/18



I would like to ask the following:


 1. Currently, there are not pipelines accessing HDFS, but tomorrow morning there will be, and it's obvious from the progress that disk balancing won't have finished. Is it safe to leave this process to finish, while having the cluster in production? 


 2. Is there something I can do to speed things up?


 3. How can I terminate this process, safely, if required?


Thank you,


Posts: 28
Registered: ‎09-20-2017

Re: diskbalancer as a background job while on production

[ Edited ]

I am copying this from the Apache documentation:


"A plan can be executed against an operational data node. Disk balancer should not interfere with other processes since it throttles how much data is copied every second."


Does "should not" means "does not" here or "other processes should not run while the balancer runs"?