Created on 02-13-2017 12:23 PM - edited 09-16-2022 04:04 AM
If I have a cluster with two kinds of data nodes, one has 48TB disks and the other has 7TB disks. Will HDFS block placement strategy consider the free space of disk on a data node?
If I run a Spark job, will the final write operation take the free disk space into consideration?
I could not find the answer by myself. Could you point me some document talking about this?
Thanks
Created on 02-13-2017 07:02 PM - edited 02-13-2017 07:04 PM
Do you have rack configuration ?
if so the then it will follow either your placement stratagy if not the default block placement policy
The block size will remain same for all data nodes Since the block size is defined for whole cluster not for individual node.
Created on 02-13-2017 08:34 PM - edited 02-13-2017 08:38 PM
In HDFS, you tell it which disk to use and it will fill up those disk. There is the ability to set how much space on those disks are reserved for non-DFS data but it doesn't actual prevent the disk from being filled up.
The issue at hand is that the smaller disk will fill up faster, so at some point they will not allow any more write operations and the cluster will have no way to balance itself out. This causes issue with HDFS replication and placement, along with hotspotting in MR, Spark, and any other jobs. Say for instance if you primarily operation on the last days worth of data for 80% of your jobs. At some point you will hit critical mass were those jobs, are running mostly on the same set of nodes.
You could set the reserved non-DFS space to different values using Host Templates in CM. This would then at least give you a warning when you are approaching filling up the smaller disk, but then at that point the larger disk would have free space that isn't getting used.
This is why it is strongly encourage to not have different hardware. If possible upgrade the smaller set. A possible option would be to use Heterogeneous storage. With it you can designate pools, so the larger nodes would be in one pool and the smaller in the other. Each ingestion point would need to set which pool it would use and you can set how many replicas go to each. This is a big architectural change those and should be carefully reviewed to see if it benefits your use case(s) in anyway.
So, simply, use the same hardware or you will more than likely run into issues.