<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Understanding HBase HDFS usage in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Understanding-HBase-HDFS-usage/m-p/176926#M75622</link>
    <description>&lt;P&gt;Using HDP-2.6.0.3 I ran&lt;/P&gt;&lt;PRE&gt;hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;generating an HBase table with the following characteristics:&lt;BR /&gt;&lt;BR /&gt;1048576 rows&lt;BR /&gt;row key length 39 bytes&lt;BR /&gt;8 columns/row with a mean size of 8 bytes each&lt;BR /&gt;&lt;BR /&gt;that should sum up to a storage requirement of approximately&lt;BR /&gt;&lt;BR /&gt;1048576*(39+8*8) = 108003328 bytes =~ 103 MB&lt;BR /&gt;&lt;BR /&gt;When I check the storage usage for that HBase table in HDFS:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test&lt;/PRE&gt;&lt;P&gt;gives&lt;/P&gt;&lt;PRE&gt;853.7 M  /apps/hbase/data/data/default/cluster_test&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;I have a HDFS replication factor of 3, however &lt;/P&gt;&lt;PRE&gt;hdfs dfs -du &lt;/PRE&gt;&lt;P&gt;should give the disk usage "before" replication anyways.&lt;BR /&gt;&lt;BR /&gt;HBase Region replication for the table is 1:&lt;BR /&gt;&lt;BR /&gt;hbase(main):001:0&amp;gt; describe 'cluster_test'&lt;BR /&gt;Table cluster_test is ENABLED   &lt;BR /&gt;cluster_test, {TABLE_ATTRIBUTES =&amp;gt; {DURABILITY =&amp;gt; 'USE_DEFAULT', REGION_REPLICAT&lt;BR /&gt;ION =&amp;gt; '1'}   &lt;BR /&gt;COLUMN FAMILIES DESCRIPTION   &lt;BR /&gt;{NAME =&amp;gt; 'test_cf', BLOOMFILTER =&amp;gt; 'ROW', VERSIONS =&amp;gt; '1', IN_MEMORY =&amp;gt; 'false',&lt;BR /&gt; KEEP_DELETED_CELLS =&amp;gt; 'FALSE', DATA_BLOCK_ENCODING =&amp;gt; 'NONE', TTL =&amp;gt; 'FOREVER',&lt;BR /&gt; COMPRESSION =&amp;gt; 'NONE', MIN_VERSIONS =&amp;gt; '0', BLOCKCACHE =&amp;gt; 'true', BLOCKSIZE =&amp;gt; &lt;BR /&gt;'65536', REPLICATION_SCOPE =&amp;gt; '0'}   &lt;BR /&gt;1 row(s) in 0.2240 seconds&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd.html"&gt;https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd.html&lt;/A&gt;&lt;BR /&gt;mentiones higher disk usage in hbase.&lt;A href="http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/"&gt;&lt;BR /&gt;http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/&lt;/A&gt;&lt;BR /&gt;mentiones doubling of disk usage during compactions.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table?&lt;/P&gt;&lt;BR /&gt;&lt;P&gt;What do I miss here?&lt;/P&gt;</description>
    <pubDate>Sat, 10 Mar 2018 03:44:10 GMT</pubDate>
    <dc:creator>pheinzlr</dc:creator>
    <dc:date>2018-03-10T03:44:10Z</dc:date>
    <item>
      <title>Understanding HBase HDFS usage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Understanding-HBase-HDFS-usage/m-p/176926#M75622</link>
      <description>&lt;P&gt;Using HDP-2.6.0.3 I ran&lt;/P&gt;&lt;PRE&gt;hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;generating an HBase table with the following characteristics:&lt;BR /&gt;&lt;BR /&gt;1048576 rows&lt;BR /&gt;row key length 39 bytes&lt;BR /&gt;8 columns/row with a mean size of 8 bytes each&lt;BR /&gt;&lt;BR /&gt;that should sum up to a storage requirement of approximately&lt;BR /&gt;&lt;BR /&gt;1048576*(39+8*8) = 108003328 bytes =~ 103 MB&lt;BR /&gt;&lt;BR /&gt;When I check the storage usage for that HBase table in HDFS:&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test&lt;/PRE&gt;&lt;P&gt;gives&lt;/P&gt;&lt;PRE&gt;853.7 M  /apps/hbase/data/data/default/cluster_test&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;I have a HDFS replication factor of 3, however &lt;/P&gt;&lt;PRE&gt;hdfs dfs -du &lt;/PRE&gt;&lt;P&gt;should give the disk usage "before" replication anyways.&lt;BR /&gt;&lt;BR /&gt;HBase Region replication for the table is 1:&lt;BR /&gt;&lt;BR /&gt;hbase(main):001:0&amp;gt; describe 'cluster_test'&lt;BR /&gt;Table cluster_test is ENABLED   &lt;BR /&gt;cluster_test, {TABLE_ATTRIBUTES =&amp;gt; {DURABILITY =&amp;gt; 'USE_DEFAULT', REGION_REPLICAT&lt;BR /&gt;ION =&amp;gt; '1'}   &lt;BR /&gt;COLUMN FAMILIES DESCRIPTION   &lt;BR /&gt;{NAME =&amp;gt; 'test_cf', BLOOMFILTER =&amp;gt; 'ROW', VERSIONS =&amp;gt; '1', IN_MEMORY =&amp;gt; 'false',&lt;BR /&gt; KEEP_DELETED_CELLS =&amp;gt; 'FALSE', DATA_BLOCK_ENCODING =&amp;gt; 'NONE', TTL =&amp;gt; 'FOREVER',&lt;BR /&gt; COMPRESSION =&amp;gt; 'NONE', MIN_VERSIONS =&amp;gt; '0', BLOCKCACHE =&amp;gt; 'true', BLOCKSIZE =&amp;gt; &lt;BR /&gt;'65536', REPLICATION_SCOPE =&amp;gt; '0'}   &lt;BR /&gt;1 row(s) in 0.2240 seconds&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd.html"&gt;https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd.html&lt;/A&gt;&lt;BR /&gt;mentiones higher disk usage in hbase.&lt;A href="http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/"&gt;&lt;BR /&gt;http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/&lt;/A&gt;&lt;BR /&gt;mentiones doubling of disk usage during compactions.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table?&lt;/P&gt;&lt;BR /&gt;&lt;P&gt;What do I miss here?&lt;/P&gt;</description>
      <pubDate>Sat, 10 Mar 2018 03:44:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Understanding-HBase-HDFS-usage/m-p/176926#M75622</guid>
      <dc:creator>pheinzlr</dc:creator>
      <dc:date>2018-03-10T03:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding HBase HDFS usage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Understanding-HBase-HDFS-usage/m-p/176927#M75623</link>
      <description>&lt;P&gt;Please read the following:&lt;/P&gt;&lt;P&gt;&lt;A href="https://blogs.apache.org/hbase/entry/the_effect_of_columnfamily_rowkey" target="_blank"&gt;https://blogs.apache.org/hbase/entry/the_effect_of_columnfamily_rowkey&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html" target="_blank"&gt;http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 11 Mar 2018 08:01:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Understanding-HBase-HDFS-usage/m-p/176927#M75623</guid>
      <dc:creator>tyu</dc:creator>
      <dc:date>2018-03-11T08:01:09Z</dc:date>
    </item>
  </channel>
</rss>

