Created on 10-10-2016 10:47 AM - edited 09-16-2022 03:43 AM
Hi experts,
This question is mostly related to DR and backup.
We already have two clusters ( where are exactly same in configuration and one is master and another is hot standby). To mitigate the risk further, we think of a 'cold backup', where we can store the HDFS data just like previous tape based backup solutions. And want to have this stored in our data center. (not on cloud)
We do not want to invest another cluster and use distcp based approach. Want to backup only hdfs data.
What could be the best solution/approach/design around the same.
Let me know if more inputs required.
Many thanks,
SS
Created 10-11-2016 01:23 AM
The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc.
Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides.
If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way
You excluded distcp as an option. As such, I suggest to look at Falcon.
Check this: http://hortonworks.com/hadoop-tutorial/mirroring-datasets-between-hadoop-clusters-with-apache-falcon...
+++++++
if any response addressed your question, please vote and accept best answer.
Created 10-10-2016 10:50 AM
Created 10-10-2016 07:53 PM
@ Smart Solution
Please refer the link if this helps you :-
Created 10-11-2016 01:23 AM
The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc.
Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides.
If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way
You excluded distcp as an option. As such, I suggest to look at Falcon.
Check this: http://hortonworks.com/hadoop-tutorial/mirroring-datasets-between-hadoop-clusters-with-apache-falcon...
+++++++
if any response addressed your question, please vote and accept best answer.