- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS tiered storage - network usage
- Labels:
-
Apache Hadoop
Created ‎04-04-2017 11:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I would like to implement tiered storage in a cluster. Suppose each node has several drives (HDDs and SSDs). If a move command is issued from a tier to another, will each node try to perform the operation "locally", moving its share of blocks between its drives, or conversely will the data be distributed again in the cluster hence consuming network capacity?
Created ‎04-04-2017 09:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.
Edit:
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
Created ‎04-04-2017 09:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Riccardo Iacomini, are you asking about the HDFS move/rename command? Move is purely a metadata operation on the NameNode and does not result in any data movement until the HDFS Mover utility is run.
Edit:
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
Created ‎04-05-2017 08:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, I've read that documentation page, but maybe I misunderstood something. Trying to explain what I actually would like to achieve maybe makes more sense: I would like to create two separate folders on HDFS, one with storage policy All_SSD, and the other with storage policy of Hot. Files placed in these folders will be moved internally to SSDs or HDDs accordingly. When I move one file from one folder to another, will it actually involve the whole cluster, hence redistributing data, or the node will try to move its share of blocks between drives? From your answer, I assume this move operation actually does nothing on a physical level, everything on that level will be managed by the mover tool, right? In that case, the question simply applies to the mover: will it try to perform the operations locally or will it involve all the nodes?
Created ‎04-05-2017 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Mover will move blocks within the same node when possible and thus try to avoid network activity.
If that is not possible (e.g. when a node doesn't have SSD or when the local SSDs are full), it will move block replicas across the network to another node that has the target media.
I've edited my answer.
