Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Backing up HDFS production data

Solved Go to solution

Backing up HDFS production data

Expert Contributor

Hi experts,

This question is mostly related to DR and backup.

We already have two clusters ( where are exactly same in configuration and one is master and another is hot standby). To mitigate the risk further, we think of a 'cold backup', where we can store the HDFS data just like previous tape based backup solutions. And want to have this stored in our data center. (not on cloud)

We do not want to invest another cluster and use distcp based approach. Want to backup only hdfs data.

What could be the best solution/approach/design around the same.

Let me know if more inputs required.

Many thanks,

SS

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Backing up HDFS production data

@Smart Solutions

The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc.

Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides.

If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way

You excluded distcp as an option. As such, I suggest to look at Falcon.

Check this: http://hortonworks.com/hadoop-tutorial/mirroring-datasets-between-hadoop-clusters-with-apache-falcon...

+++++++

if any response addressed your question, please vote and accept best answer.

3 REPLIES 3

Re: Backing up HDFS production data

Expert Contributor

Re: Backing up HDFS production data

Re: Backing up HDFS production data

@Smart Solutions

The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc.

Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides.

If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way

You excluded distcp as an option. As such, I suggest to look at Falcon.

Check this: http://hortonworks.com/hadoop-tutorial/mirroring-datasets-between-hadoop-clusters-with-apache-falcon...

+++++++

if any response addressed your question, please vote and accept best answer.

Don't have an account?
Coming from Hortonworks? Activate your account here