Support Questions

Find answers, ask questions, and share your expertise

HDFS Snapshot - Size

avatar
Expert Contributor

Hello experts,

I'm trying to understand the total size / block size used by HDFS Snapshot.

I have a dir like /user/x/data and a hdfs ls tells me it has 1.1 TB

So If I take a snapshot of /user/x/data will the snapshot consumes same space and how much block size is used by it.

My earlier output from hdfs dfsadmin -report was 19.6 TB and after taking snapshot it was still same.

If snapshots takes same space as of the source why the report does't changes.

Thanks Mayank

1 ACCEPTED SOLUTION

avatar

@mkataria

With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.

Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.

https://issues.apache.org/jira/browse/HDFS-2802

View solution in original post

1 REPLY 1

avatar

@mkataria

With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.

Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.

https://issues.apache.org/jira/browse/HDFS-2802