Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS tiered storage - network usage

avatar
Contributor

Hello, I would like to implement tiered storage in a cluster. Suppose each node has several drives (HDDs and SSDs). If a move command is issued from a tier to another, will each node try to perform the operation "locally", moving its share of blocks between its drives, or conversely will the data be distributed again in the cluster hence consuming network capacity?

1 ACCEPTED SOLUTION

avatar

@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A...

Edit:

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

View solution in original post

3 REPLIES 3

avatar

@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A...

Edit:

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

avatar
Contributor

Thank you, I've read that documentation page, but maybe I misunderstood something. Trying to explain what I actually would like to achieve maybe makes more sense: I would like to create two separate folders on HDFS, one with storage policy All_SSD, and the other with storage policy of Hot. Files placed in these folders will be moved internally to SSDs or HDDs accordingly. When I move one file from one folder to another, will it actually involve the whole cluster, hence redistributing data, or the node will try to move its share of blocks between drives? From your answer, I assume this move operation actually does nothing on a physical level, everything on that level will be managed by the mover tool, right? In that case, the question simply applies to the mover: will it try to perform the operations locally or will it involve all the nodes?

avatar

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

I've edited my answer.