Created on 07-20-2016 06:22 PM - edited 09-16-2022 03:30 AM
Hello experts,
I'm trying to understand the total size / block size used by HDFS Snapshot.
I have a dir like /user/x/data and a hdfs ls tells me it has 1.1 TB
So If I take a snapshot of /user/x/data will the snapshot consumes same space and how much block size is used by it.
My earlier output from hdfs dfsadmin -report was 19.6 TB and after taking snapshot it was still same.
If snapshots takes same space as of the source why the report does't changes.
Thanks Mayank
Created 07-20-2016 08:50 PM
With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.
Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.
Created 07-20-2016 08:50 PM
With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.
Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.