Created 10-13-2015 10:07 PM
Created 10-14-2015 03:39 PM
Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed.
I hope this helps,
Eric
Created 10-14-2015 03:39 PM
Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed.
I hope this helps,
Eric
Created 10-14-2015 05:56 PM
"you don't need to backup HDFS since it stores 3 copies by default" : IMHO, I think we need to take care with that message. Having some replicas don't protect us again a "human error" or rogue administrator (hdfs dfs -rmr /), neither again an application bug.
It's just like RAID1: it's good but no IT department would consider that it serves as a backup.
Created 10-15-2015 12:17 PM
@Cassandra
HDFS Snapshots HBASE Snapshots Hive Metadata (DBA can work on setting up this based on DB flavor used for HCatalog)
Going back to your original question
This is helpful to understand the architecture. We can point it to DR cluster (It can be on prem or in cloud) as Eric mentioned.