Support Questions

Find answers, ask questions, and share your expertise

Understanding HBase HDFS usage

avatar
Explorer

Using HDP-2.6.0.3 I ran

hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576


generating an HBase table with the following characteristics:

1048576 rows
row key length 39 bytes
8 columns/row with a mean size of 8 bytes each

that should sum up to a storage requirement of approximately

1048576*(39+8*8) = 108003328 bytes =~ 103 MB

When I check the storage usage for that HBase table in HDFS:

hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test

gives

853.7 M  /apps/hbase/data/data/default/cluster_test


I have a HDFS replication factor of 3, however

hdfs dfs -du 

should give the disk usage "before" replication anyways.

HBase Region replication for the table is 1:

hbase(main):001:0> describe 'cluster_test'
Table cluster_test is ENABLED
cluster_test, {TABLE_ATTRIBUTES => {DURABILITY => 'USE_DEFAULT', REGION_REPLICAT
ION => '1'}
COLUMN FAMILIES DESCRIPTION
{NAME => 'test_cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =>
'65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2240 seconds

https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd...
mentiones higher disk usage in hbase.
http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/

mentiones doubling of disk usage during compactions.


Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table?


What do I miss here?

1 ACCEPTED SOLUTION
1 REPLY 1

avatar
Master Collaborator