Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

Solved Go to solution
Highlighted

What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

Explorer
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

@hitaay

Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.

a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.

b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.

5 REPLIES 5

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

Super Guru
@hitaay

Can you please elaborate your question. you can simply use HDFS Snapshots to create point in time backups. Here is a link on snapshots. If this is not what you are looking for, can you please elaborate?

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

Explorer

@mqureshi Thank you for your response. Yes, HDFS snapshot is one of the option for point in time recovery. However, it seems there are many implications of using .snapshots particular the huge size of .snapshots and complexity of restoring the backups.

My question is in two folds -

a) Is HDFS snapshot is only way/approach to point in time recovery or there are other approaches?

b) Does WANdisco Fusion (DR product, endorsed by Hortonworks) provide point in time recovery?

Many thanks.

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

@hitaay

Snapshots do not create extra copies of blocks on the file system. Snapshots are stored along with the NameNode’s file system namespace. What do you mean by "huge size of snapshots and restoring the backups"? The entire point of snapshot is to not create extra copies of blocks on the file system and restore to a point in time a specific file or all.

a) There are always many ways to skin a cat, but what test did you do with the HDFS snapshot and failed you? Could you elaborate a little. That would help.

b) "Point in Time Recovery" - question for WANDisco. We endorse HDFS snapshot first for its function. WanDisco or other tool is your option.

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

@hitaay

https://community.hortonworks.com/questions/394/what-are-best-practices-for-setting-up-backup-and.ht...

Single tool solution is desirable, but it also comes with a price tag. Look at the link above. You can use a combination of HDFS snapshot and your standard database point in time recovery methods for database used for the metadata. You can leverage that practice and avoid extra-cost for something that is really not Hadoop specific.

If any response from this thread helped, please vote/accept best answer.

Re: What is the best approach/tool for point in time data recovery with HDP2.4 platform? Does WANdisco support point in time recovery?

Explorer
@Constantin Stanca

We have option of either using HDFS snapshots or using WANdisco tool for designing point in Time Recovery for cluster. However, we wanted to go with approach/tool which covers backup of hadoop meta-store and configuration files in addition of backing up blocks on data nodes.

Look forward to your expertise advice on this.

Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here