Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS Snapshot take full backup or incremental backup

HDFS Snapshot take full backup or incremental backup

New Contributor

The snapshot feature we have in Cloudera, i would like to know if it take a full backup or incremental backup.

 

Because if it is taking a full backup everytime then there is a drawback to it as for an example i make changes and add a 2MB data to my 1000GB file then it will again take a backup of entire 1000GB file and not just a 2MB changes i made.

 

Would request someone to answer on above query.

3 REPLIES 3
Highlighted

Re: HDFS Snapshot take full backup or incremental backup

Contributor

The snapshot is not a full copy of the data, rather a copy of the metadata at that point in time. Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.

Re: HDFS Snapshot take full backup or incremental backup

New Contributor

Does that mean, when i am copying my data from Prod to DR cluster it is just copying the Meta Data and not the actual data, if that is the case then i may loose my data if my Prod goes down because my DR only has meta data and not the actual data.

Re: HDFS Snapshot take full backup or incremental backup

Master Guru
@Amir - No.

What @RobertM mentions is with respect to how capturing a snapshot
internally works within your HDFS. That is, the cost of taking the snapshot
in the distributed filesystem.

When talking of copying data between clusters or storage systems, copying a
snapshotted file is no different than copying a regular file - they both
will copy the same way, with bytes and with metadata. There's no "copy only
metadata" operation.