We run Hadoop FS in a virtualized cluster, 50 Datanode VMs/5TB per datanode, spread across 30 or so hosts. Previously this was the max that we could fit due to availability of compute/storage at the the time. This has been running great, we now want to increase out storage. Each VM has a single 5T vdisk that sits on top of a RAID60.
We've now got more storage, allowing me to increase the existing datanode disk from 5T to 8T. - This'll give me another 150T without using any additional hardware. We've got denser hosts coming, allowing for about 16TB per VM and about 15 VMs spread across these hosts. - This'll give me another 240T if I'm able to utilize all of it.
The above is great, however I'm cautious due to reading about different size Hadoop FS nodes. Will I run into issues if I have a mix of 5T and 8T nodes?
If I use the denser nodes in a similar setup, each VM on the denser hosts would then have 16T available. Does Hadoop round-robin across nodes or partitions, could I just setup 2x8T vdisks for VMs backed by the larger disks?