10-18-2016 07:03 AM
Hi. Some of my datanodes have different disk size. For example:
/dev/sdc1 918G 384G 534G 42% /data/disk1
/dev/sdd1 459G 381G 78G 84% /data/disk2
/dev/sde1 459G 391G 69G 86% /data/disk3
/dev/sdf1 459G 389G 70G 85% /data/disk4
My understanding is that there is currently no functionality for balancing within a datanode, so I'd have to move data around manually. I've found this article on performing the procedure: http://www-01.ibm.com/support/docview.wss?uid=swg21702775 (Procedure 1). Has anyone actually done this (or something similar)? Can you share any issues/caveats you ran across? Is this the best way to do it? If the other 3 disks fill up, will that datanode continue to write to disk1?
10-18-2016 11:08 AM
I feel what you described has its own inherent risk.
Since CDH5.8.2, you can use a new HDFS feature: intra datanode balancer to do exactly what you asked for. And we have a new blog post about this feature:
10-20-2016 07:03 AM