Support Questions

Find answers, ask questions, and share your expertise

What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Super Guru

@hitaay

Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.

a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.

b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.

View solution in original post

5 REPLIES 5

avatar
Super Guru
@hitaay

Can you please elaborate your question. you can simply use HDFS Snapshots to create point in time backups. Here is a link on snapshots. If this is not what you are looking for, can you please elaborate?

avatar
Rising Star

@mqureshi Thank you for your response. Yes, HDFS snapshot is one of the option for point in time recovery. However, it seems there are many implications of using .snapshots particular the huge size of .snapshots and complexity of restoring the backups.

My question is in two folds -

a) Is HDFS snapshot is only way/approach to point in time recovery or there are other approaches?

b) Does WANdisco Fusion (DR product, endorsed by Hortonworks) provide point in time recovery?

Many thanks.

avatar
Super Guru

@hitaay

Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.

a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.

b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.

avatar
Super Guru

@hitaay

https://community.hortonworks.com/questions/394/what-are-best-practices-for-setting-up-backup-and.ht...

Single tool solution is desirable, but it also comes with a price tag. Look at the link above. You can use a combination of HDFS snapshot and your standard database point in time recovery methods for database used for the metadata. You can leverage that practice and avoid extra-cost for something that is really not Hadoop specific.

If any response from this thread helped, please vote/accept best answer.

avatar
Rising Star
@Constantin Stanca

We have option of either using HDFS snapshots or using WANdisco tool for designing point in Time Recovery for cluster. However, we wanted to go with approach/tool which covers backup of hadoop meta-store and configuration files in addition of backing up blocks on data nodes.

Look forward to your expertise advice on this.

Thanks.