Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDFS Snapshot - Size

avatar
Expert Contributor

Hello experts,

I'm trying to understand the total size / block size used by HDFS Snapshot.

I have a dir like /user/x/data and a hdfs ls tells me it has 1.1 TB

So If I take a snapshot of /user/x/data will the snapshot consumes same space and how much block size is used by it.

My earlier output from hdfs dfsadmin -report was 19.6 TB and after taking snapshot it was still same.

If snapshots takes same space as of the source why the report does't changes.

Thanks Mayank

1 ACCEPTED SOLUTION

avatar

@mkataria

With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.

Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.

https://issues.apache.org/jira/browse/HDFS-2802

View solution in original post

1 REPLY 1

avatar

@mkataria

With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.

Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.

https://issues.apache.org/jira/browse/HDFS-2802