Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is a suggested offsite/cold backup method for HDFS? besides AWS S3

Solved Go to solution

What is a suggested offsite/cold backup method for HDFS? besides AWS S3

Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3

Explorer

@Cassandra,

Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed.

I hope this helps,

Eric

3 REPLIES 3

Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3

Explorer

@Cassandra,

Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed.

I hope this helps,

Eric

Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3

Expert Contributor

"you don't need to backup HDFS since it stores 3 copies by default" : IMHO, I think we need to take care with that message. Having some replicas don't protect us again a "human error" or rogue administrator (hdfs dfs -rmr /), neither again an application bug.

It's just like RAID1: it's good but no IT department would consider that it serves as a backup.

Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3

@Cassandra

HDFS Snapshots HBASE Snapshots Hive Metadata (DBA can work on setting up this based on DB flavor used for HCatalog)

Going back to your original question

This is helpful to understand the architecture. We can point it to DR cluster (It can be on prem or in cloud) as Eric mentioned.

Don't have an account?
Coming from Hortonworks? Activate your account here