- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS Snapshot - Size
- Labels:
-
HDFS
Created on ‎07-20-2016 06:22 PM - edited ‎09-16-2022 03:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello experts,
I'm trying to understand the total size / block size used by HDFS Snapshot.
I have a dir like /user/x/data and a hdfs ls tells me it has 1.1 TB
So If I take a snapshot of /user/x/data will the snapshot consumes same space and how much block size is used by it.
My earlier output from hdfs dfsadmin -report was 19.6 TB and after taking snapshot it was still same.
If snapshots takes same space as of the source why the report does't changes.
Thanks Mayank
Created ‎07-20-2016 08:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.
Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.
Created ‎07-20-2016 08:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With HDFS Snapshots there is no actual data copying up front for a new snapshot. It is simply a pointer to a record in time (point-in-time). So when you first take a snapshot, your HDFS storage usage will stay the same. It is only when you modify the data that data is copied/written. This follows the Copy on Write (COW) concept.
Please take a look at the below JIRA. IT contains the discussion that lead to the design and is quite informative.
