Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Understanding HBase HDFS usage

Solved Go to solution

Understanding HBase HDFS usage

New Contributor

Using HDP-2.6.0.3 I ran

hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576


generating an HBase table with the following characteristics:

1048576 rows
row key length 39 bytes
8 columns/row with a mean size of 8 bytes each

that should sum up to a storage requirement of approximately

1048576*(39+8*8) = 108003328 bytes =~ 103 MB

When I check the storage usage for that HBase table in HDFS:

hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test

gives

853.7 M  /apps/hbase/data/data/default/cluster_test


I have a HDFS replication factor of 3, however

hdfs dfs -du 

should give the disk usage "before" replication anyways.

HBase Region replication for the table is 1:

hbase(main):001:0> describe 'cluster_test'
Table cluster_test is ENABLED
cluster_test, {TABLE_ATTRIBUTES => {DURABILITY => 'USE_DEFAULT', REGION_REPLICAT
ION => '1'}
COLUMN FAMILIES DESCRIPTION
{NAME => 'test_cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =>
'65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2240 seconds

https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd...
mentiones higher disk usage in hbase.
http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/

mentiones doubling of disk usage during compactions.


Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table?


What do I miss here?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Understanding HBase HDFS usage

Super Collaborator
1 REPLY 1
Highlighted

Re: Understanding HBase HDFS usage

Super Collaborator