Created 09-30-2016 12:04 AM
Created 09-30-2016 05:13 PM
Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.
a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.
b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.
Created 09-30-2016 04:50 AM
Created 09-30-2016 10:32 AM
@mqureshi Thank you for your response. Yes, HDFS snapshot is one of the option for point in time recovery. However, it seems there are many implications of using .snapshots particular the huge size of .snapshots and complexity of restoring the backups.
My question is in two folds -
a) Is HDFS snapshot is only way/approach to point in time recovery or there are other approaches?
b) Does WANdisco Fusion (DR product, endorsed by Hortonworks) provide point in time recovery?
Many thanks.
Created 09-30-2016 05:13 PM
Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.
a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.
b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.
Created 10-05-2016 08:40 PM
Single tool solution is desirable, but it also comes with a price tag. Look at the link above. You can use a combination of HDFS snapshot and your standard database point in time recovery methods for database used for the metadata. You can leverage that practice and avoid extra-cost for something that is really not Hadoop specific.
If any response from this thread helped, please vote/accept best answer.
Created 10-03-2016 08:36 AM
We have option of either using HDFS snapshots or using WANdisco tool for designing point in Time Recovery for cluster. However, we wanted to go with approach/tool which covers backup of hadoop meta-store and configuration files in addition of backing up blocks on data nodes.
Look forward to your expertise advice on this.
Thanks.