<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: HIVE/HBASE hdfs replication in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104652#M67549</link>
    <description>&lt;P&gt;Thanks for the insights.&lt;/P&gt;</description>
    <pubDate>Wed, 15 Feb 2017 02:11:20 GMT</pubDate>
    <dc:creator>vtpcnk</dc:creator>
    <dc:date>2017-02-15T02:11:20Z</dc:date>
    <item>
      <title>HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104647#M67544</link>
      <description>&lt;P&gt;My question is NOT about HIVE/HBASE replication across clusters.&lt;/P&gt;&lt;P&gt;But rather about whether HIVE and HBASE since they sit on top of HDFS, will the default HDFS replication factor affect HIVE and HBASE data. So within a single cluster, on a HIVE or HBASE setup, are there three copies (default replication factor) of each HIVE/HBASE table sitting across the HDFS?&lt;/P&gt;&lt;P&gt;Appreciate the insights.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 22:50:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104647#M67544</guid>
      <dc:creator>vtpcnk</dc:creator>
      <dc:date>2017-02-14T22:50:09Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104648#M67545</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;the short answer is yes. For example HBase stores all of its files on HDFS, so these files will be replicated based on the replication factor of the underlying HDFS configuration. HBase itself does not even take care of storing data multiple times, because it is the responsibility of the underlying file system.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 22:58:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104648#M67545</guid>
      <dc:creator>jgub</dc:creator>
      <dc:date>2017-02-14T22:58:48Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104649#M67546</link>
      <description>&lt;P&gt;
	For Hive, files created in /apps/hive/warehouse/&amp;lt;database name&amp;gt;/&amp;lt;tablename&amp;gt;/data dfs.replication factor will be honored by default (Unless user explicitly sets replication factor for a files/files under directory).&lt;/P&gt;&lt;P&gt;
	For example I have database testnumber and table name numberstringtest (stored as Text format) and data inside has files with each file consisting of one row. In below output column 2 says replication factor which is 3 in my case.&lt;/P&gt;&lt;PRE&gt;$hdfs dfs -ls /apps/hive/warehouse/testnumber.db/numberstringtest/
Found 5 items
-rw-r--r--   3 hadoopadmin hdfs          9 2017-02-09 16:31 /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0
-rw-r--r--   3 hadoopadmin hdfs          9 2017-02-09 16:31 /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0_copy_1
-rw-r--r--   3 hadoopadmin hdfs         10 2017-02-09 16:31 /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0_copy_2
-rw-r--r--   3 hadoopadmin hdfs         10 2017-02-09 16:31 /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0_copy_3
-rw-r--r--   3 hadoopadmin hdfs         10 2017-02-09 16:31 /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0_copy_4&lt;/PRE&gt;&lt;P&gt;Below is command I would use to find replicated block storage information for a file.&lt;/P&gt;&lt;P&gt;$ hdfs fsck /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0 -files -locations -blocks&lt;/P&gt;&lt;PRE&gt;Connecting to namenode via &lt;A href="http://hdp-ranger-1.openstacklocal:50070/fsck?ugi=hdfs&amp;amp;files=1&amp;amp;locations=1&amp;amp;blocks=1&amp;amp;path=%2Fapps%2Fhive%2Fwarehouse%2Ftestnumber.db%2Fnumberstringtest%2F000000_0" target="_blank"&gt;http://hdp-ranger-1.openstacklocal:50070/fsck?ugi=hdfs&amp;amp;files=1&amp;amp;locations=1&amp;amp;blocks=1&amp;amp;path=%2Fapps%2Fhive%2Fwarehouse%2Ftestnumber.db%2Fnumberstringtest%2F000000_0&lt;/A&gt;
FSCK started by hdfs (auth:KERBEROS_SSL) from /172.26.92.141 for path /apps/hive/warehouse/testnumber.db/numberstringtest/000000_0 at Tue Feb 14 15:07:58 UTC 2017
/apps/hive/warehouse/testnumber.db/numberstringtest/000000_0 9 bytes, 1 block(s):  OK
0. BP-1221127906-172.26.92.141-1485863848635:blk_1073744644_3922 len=9 repl=3 [DatanodeInfoWithStorage[172.26.92.142:1019,DS-bc1af702-2112-4c84-880c-506934af5309,DISK], DatanodeInfoWithStorage[172.26.92.141:1019,DS-51baba7f-9220-481c-9957-8a33fb1c1bb7,DISK], DatanodeInfoWithStorage[172.26.92.143:1019,DS-ef06873b-50e4-4c3b-a423-4b174cd465d8,DISK]]


Status: HEALTHY
 Total size:	9 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	1 (avg. block size 9 B)
 Minimally replicated blocks:	1 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	3.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		4
 Number of racks:		1
FSCK ended at Tue Feb 14 15:07:58 UTC 2017 in 3 milliseconds




The filesystem under path '/apps/hive/warehouse/testnumber.db/numberstringtest/000000_0' is HEALTHY
&lt;/PRE&gt;&lt;P&gt;
	I am not sure about HBase, I guess dfs.replication factor should be honored by default, unless explicitly given for a file in HDFS.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 23:15:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104649#M67546</guid>
      <dc:creator>cravani</dc:creator>
      <dc:date>2017-02-14T23:15:44Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104650#M67547</link>
      <description>&lt;P&gt;Are 000000_0_copy_1, 000000_0_copy_2, 000000_0_copy_3 the hdfs replication copies of 000000_0 ? &lt;/P&gt;&lt;P&gt;Or are they independent tables that you had created?&lt;/P&gt;&lt;P&gt;Appreciate the feedback. &lt;/P&gt;</description>
      <pubDate>Tue, 14 Feb 2017 23:40:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104650#M67547</guid>
      <dc:creator>vtpcnk</dc:creator>
      <dc:date>2017-02-14T23:40:40Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104651#M67548</link>
      <description>&lt;P&gt;Another related question is if cluster replication is enabled for HBASE/HIVE for HA, is HDFS replication still required? In such cases, isn't default replication factor of 3 a overkill? Is it possible to reduce HDFS replication factor to 2 (one copy) in such cases?
&lt;/P&gt;&lt;P&gt;Any insights on what the standard practice across the industry is? &lt;/P&gt;&lt;P&gt;Appreciate the feedback.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 00:12:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104651#M67548</guid>
      <dc:creator>vtpcnk</dc:creator>
      <dc:date>2017-02-15T00:12:01Z</dc:date>
    </item>
    <item>
      <title>Re: HIVE/HBASE hdfs replication</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104652#M67549</link>
      <description>&lt;P&gt;Thanks for the insights.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 02:11:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-HBASE-hdfs-replication/m-p/104652#M67549</guid>
      <dc:creator>vtpcnk</dc:creator>
      <dc:date>2017-02-15T02:11:20Z</dc:date>
    </item>
  </channel>
</rss>

