<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95357#M8707</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/357/cspencer.html" nodeid="357"&gt;@Cassandra&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed. &lt;/P&gt;&lt;P&gt;I hope this helps,&lt;/P&gt;&lt;P&gt;Eric&lt;/P&gt;</description>
    <pubDate>Wed, 14 Oct 2015 22:39:30 GMT</pubDate>
    <dc:creator>emizell</dc:creator>
    <dc:date>2015-10-14T22:39:30Z</dc:date>
    <item>
      <title>What is a suggested offsite/cold backup method for HDFS? besides AWS S3</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95356#M8706</link>
      <description />
      <pubDate>Wed, 14 Oct 2015 05:07:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95356#M8706</guid>
      <dc:creator>cspencer</dc:creator>
      <dc:date>2015-10-14T05:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95357#M8707</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/357/cspencer.html" nodeid="357"&gt;@Cassandra&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Ideally, you don't need to backup HDFS since it stores 3 copies by default. If you need a DR strategy, a good strategy is to have a separate cluster in another datacenter. Use Apache Falcon or distcp to mirror the data to the DR cluster. If you need to backup certain high value datasets, take a snapshot of the data and back it up to tape (ugh!) or put it on your corporate SAN/NAS (if permitted). This will give you a way to recover the data if disaster strikes. I don't know if you are adverse to cloud storage (based on your S3 comment), but it is cheap and online all the time to recover data when needed. &lt;/P&gt;&lt;P&gt;I hope this helps,&lt;/P&gt;&lt;P&gt;Eric&lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 22:39:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95357#M8707</guid>
      <dc:creator>emizell</dc:creator>
      <dc:date>2015-10-14T22:39:30Z</dc:date>
    </item>
    <item>
      <title>Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95358#M8708</link>
      <description>&lt;P&gt;"you don't need to backup HDFS since it stores 3 copies by default" : IMHO, I think we need to take care with that message. Having some replicas don't protect us again a "human error" or rogue administrator (hdfs dfs -rmr /), neither again an application bug.&lt;/P&gt;&lt;P&gt;It's just like RAID1: it's good but no IT department would consider that it serves as a backup.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Oct 2015 00:56:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95358#M8708</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2015-10-15T00:56:29Z</dc:date>
    </item>
    <item>
      <title>Re: What is a suggested offsite/cold backup method for HDFS? besides AWS S3</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95359#M8709</link>
      <description>&lt;P&gt;@Cassandra&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html"&gt;HDFS Snapshots &lt;/A&gt; &lt;A target="_blank" href="http://hbase.apache.org/0.94/book/ops.snapshots.html"&gt;HBASE Snapshots&lt;/A&gt; Hive Metadata (DBA can work on setting up this based on DB flavor used for HCatalog)&lt;/P&gt;&lt;P&gt;Going back to your original question&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://hortonworks.com/hadoop-tutorial/incremental-backup-data-hdp-azure-disaster-recovery-burst-capacity/"&gt;This&lt;/A&gt; is helpful to understand the architecture. We can point it to DR cluster (It can be on prem or in cloud) as Eric mentioned. &lt;/P&gt;</description>
      <pubDate>Thu, 15 Oct 2015 19:17:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-a-suggested-offsite-cold-backup-method-for-HDFS/m-p/95359#M8709</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-10-15T19:17:33Z</dc:date>
    </item>
  </channel>
</rss>

