Created 04-04-2017 11:56 AM
Hello, I would like to implement tiered storage in a cluster. Suppose each node has several drives (HDDs and SSDs). If a move command is issued from a tier to another, will each node try to perform the operation "locally", moving its share of blocks between its drives, or conversely will the data be distributed again in the cluster hence consuming network capacity?
Created 04-04-2017 09:06 PM
@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.
Edit:
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
Created 04-04-2017 09:06 PM
@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.
Edit:
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
Created 04-05-2017 08:42 AM
Thank you, I've read that documentation page, but maybe I misunderstood something. Trying to explain what I actually would like to achieve maybe makes more sense: I would like to create two separate folders on HDFS, one with storage policy All_SSD, and the other with storage policy of Hot. Files placed in these folders will be moved internally to SSDs or HDDs accordingly. When I move one file from one folder to another, will it actually involve the whole cluster, hence redistributing data, or the node will try to move its share of blocks between drives? From your answer, I assume this move operation actually does nothing on a physical level, everything on that level will be managed by the mover tool, right? In that case, the question simply applies to the mover: will it try to perform the operations locally or will it involve all the nodes?
Created 04-05-2017 01:36 PM
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
I've edited my answer.