Reply
New Contributor
Posts: 4
Registered: ‎01-04-2018

HDFS Snapshot take full backup or incremental backup

[ Edited ]

The snapshot feature we have in Cloudera, i would like to know if it take a full backup or incremental backup.

 

Because if it is taking a full backup everytime then there is a drawback to it as for an example i make changes and add a 2MB data to my 1000GB file then it will again take a backup of entire 1000GB file and not just a 2MB changes i made.

 

Would request someone to answer on above query.

New Contributor
Posts: 15
Registered: ‎03-07-2017

Re: HDFS Snapshot take full backup or incremental backup

The snapshot is not a full copy of the data, rather a copy of the metadata at that point in time. Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.

New Contributor
Posts: 4
Registered: ‎01-04-2018

Re: HDFS Snapshot take full backup or incremental backup

Does that mean, when i am copying my data from Prod to DR cluster it is just copying the Meta Data and not the actual data, if that is the case then i may loose my data if my Prod goes down because my DR only has meta data and not the actual data.

Highlighted
Posts: 1,572
Kudos: 295
Solutions: 241
Registered: ‎07-31-2013

Re: HDFS Snapshot take full backup or incremental backup

@Amir - No.

What @RobertM mentions is with respect to how capturing a snapshot
internally works within your HDFS. That is, the cost of taking the snapshot
in the distributed filesystem.

When talking of copying data between clusters or storage systems, copying a
snapshotted file is no different than copying a regular file - they both
will copy the same way, with bytes and with metadata. There's no "copy only
metadata" operation.
Announcements