<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HBase HA vs HDFS replication... in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189946#M59018</link>
    <description>&lt;P&gt;After more reading, it seems that region replication may be used for &lt;EM&gt;&lt;STRONG&gt;read &lt;/STRONG&gt;&lt;/EM&gt;high availability...&lt;/P&gt;&lt;P&gt;If I understand properly, it means that when a RS fails, its regions are moved to other "valid" region servers and are still available, but it may take a while ... So region replication's purpose is &lt;EM&gt;just &lt;/EM&gt;to reduce this waiting period ? Nothing related to data physical replication in order to guarantee that we won't loose any data, right ?&lt;/P&gt;</description>
    <pubDate>Thu, 06 Apr 2017 16:15:54 GMT</pubDate>
    <dc:creator>schausson</dc:creator>
    <dc:date>2017-04-06T16:15:54Z</dc:date>
    <item>
      <title>HBase HA vs HDFS replication...</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189945#M59017</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;I'm currently looking at "HA" feature of HBase, but cannot figure out how it works exactly. &lt;/P&gt;&lt;P&gt;I first created tables using default java API, without specifying any region replication value, and thinking that default HDFS replication mechanism would guarantee data availability.
Actually, when I look at region files on HDFS, they are shown with "3" as replication factors : &lt;/P&gt;&lt;P&gt;Ex : &lt;/P&gt;&lt;P&gt; [myuser@myhost ~]$ hdfs dfs -ls /apps/hbase/data/data/default/MY_TEST_TABLE/f24af874470de9b85c2e1bd0ff5f80b3/0
Found 1 items
-rw-------  3 hbase hdfs  12234 2017-03-29 15:44 /apps/hbase/data/data/default/MY_TEST_TABLE/f24af874470de9b85c2e1bd0ff5f80b3/0/125b6555b2274e64b1ba4e9a8ef42885&lt;/P&gt;&lt;P&gt;So why should I set a region replication value (eg. 3) in addition to default HDFS one ? 
Does it means that my data will eventually be replicated by 9 ? &lt;/P&gt;&lt;P&gt;Thanks for any clue about this...&lt;/P&gt;&lt;P&gt;Sebastien&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 14:27:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189945#M59017</guid>
      <dc:creator>schausson</dc:creator>
      <dc:date>2017-04-06T14:27:07Z</dc:date>
    </item>
    <item>
      <title>Re: HBase HA vs HDFS replication...</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189946#M59018</link>
      <description>&lt;P&gt;After more reading, it seems that region replication may be used for &lt;EM&gt;&lt;STRONG&gt;read &lt;/STRONG&gt;&lt;/EM&gt;high availability...&lt;/P&gt;&lt;P&gt;If I understand properly, it means that when a RS fails, its regions are moved to other "valid" region servers and are still available, but it may take a while ... So region replication's purpose is &lt;EM&gt;just &lt;/EM&gt;to reduce this waiting period ? Nothing related to data physical replication in order to guarantee that we won't loose any data, right ?&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 16:15:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189946#M59018</guid>
      <dc:creator>schausson</dc:creator>
      <dc:date>2017-04-06T16:15:54Z</dc:date>
    </item>
    <item>
      <title>Re: HBase HA vs HDFS replication...</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189947#M59019</link>
      <description>&lt;P&gt;Yes, exactly! Data stored on HDFS is not affected in any way, so all files used by a single HBase region are still replaced only 3 times. What is further replicated to achieve RS HA are read-only secondary keys held by respective Region Servers. You can find a good explanation &lt;A href="https://hortonworks.com/blog/apache-hbase-high-availability-next-level/"&gt;here&lt;/A&gt;. What you get in return is faster recovery for reading from HBase. For "write" you still need to wait longer (like without RS HA), until the HBase master activates affected regions on other Region Servers.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 16:37:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/HBase-HA-vs-HDFS-replication/m-p/189947#M59019</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2017-04-06T16:37:12Z</dc:date>
    </item>
  </channel>
</rss>

