HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system.Snapshots are very efficient because they only copy data that are changed. We can restore the data to any previous snapshot. Some common use cases of snapshots are Data backup and Disaster recovery.
HDFS Snapshot Extension:
Falcon will support HDFS snapshot-based replication through HDFS Snapshot extension. Using this feature,
create and manage snapshots on source/target directories.
Mirror data from source to target for disaster recovery using these snapshots.
Perform retention on the snapshots created on source and target.
Snapshot replication will only work from single source directory to single target directory.
For snapshot to work, we expect users to do the following
Both source and target clusters must have a version of Hadoop 2.7.0 or higher.
The user submitting and scheduling the falcon extension should have permissions on both source and target directories.
Both directories should be snap shotable.
To perform the HDFS Snapshot replication in Falcon, We need to create the source, target cluster entities and also need to create/give permissions to the staging and working directories. Please use the following steps to accomplish it.