Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

HDFS Snapshots Overview

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

The implementation of HDFS Snapshots is efficient in the following ways:

1) Snapshot creation is instantaneous. The cost is O(1) excluding the inode lookup time.

8337-snapshot1.png

2) Additional memory is used only when modifications are made relative to a snapshot. Memory usage is O(M), where M is the number of modified files/directories.

8338-snapshot2.png

3) Blocks in datanodes are not copied. The snapshot files record the block list and the file size.

8339-snapshot3.png

4) Snapshots do not adversely affect regular HDFS operations, and there is a minor performance impact from accessing snapshotted data depending on the number of modifications. The snapshot data is computed by subtracting the modifications from the current data (snapshot data = current data – modifications). Also, modifications are recorded in reverse chronological order so that the current data can be accessed directly.

8340-snapshot4.png

**See Also**

HDFS Snapshots - 2) Operations

2,139 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 09:13 AM
Updated by:
 
Contributors
Top Kudoed Authors