Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS tiered storage - network usage

Solved Go to solution

HDFS tiered storage - network usage

New Contributor

Hello, I would like to implement tiered storage in a cluster. Suppose each node has several drives (HDDs and SSDs). If a move command is issued from a tier to another, will each node try to perform the operation "locally", moving its share of blocks between its drives, or conversely will the data be distributed again in the cluster hence consuming network capacity?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: HDFS tiered storage - network usage

@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A...

Edit:

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

3 REPLIES 3

Re: HDFS tiered storage - network usage

@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A...

Edit:

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

Highlighted

Re: HDFS tiered storage - network usage

New Contributor

Thank you, I've read that documentation page, but maybe I misunderstood something. Trying to explain what I actually would like to achieve maybe makes more sense: I would like to create two separate folders on HDFS, one with storage policy All_SSD, and the other with storage policy of Hot. Files placed in these folders will be moved internally to SSDs or HDDs accordingly. When I move one file from one folder to another, will it actually involve the whole cluster, hence redistributing data, or the node will try to move its share of blocks between drives? From your answer, I assume this move operation actually does nothing on a physical level, everything on that level will be managed by the mover tool, right? In that case, the question simply applies to the mover: will it try to perform the operations locally or will it involve all the nodes?

Re: HDFS tiered storage - network usage

The Mover will move blocks within the same node when possible and thus try to avoid network activity.

If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.

I've edited my answer.