Created 03-09-2018 07:44 PM
Using HDP-2.6.0.3 I ran
hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576
generating an HBase table with the following characteristics:
1048576 rows
row key length 39 bytes
8 columns/row with a mean size of 8 bytes each
that should sum up to a storage requirement of approximately
1048576*(39+8*8) = 108003328 bytes =~ 103 MB
When I check the storage usage for that HBase table in HDFS:
hdfs dfs -du -h -s /apps/hbase/data/data/default/cluster_test
gives
853.7 M /apps/hbase/data/data/default/cluster_test
I have a HDFS replication factor of 3, however
hdfs dfs -du
should give the disk usage "before" replication anyways.
HBase Region replication for the table is 1:
hbase(main):001:0> describe 'cluster_test'
Table cluster_test is ENABLED
cluster_test, {TABLE_ATTRIBUTES => {DURABILITY => 'USE_DEFAULT', REGION_REPLICAT
ION => '1'}
COLUMN FAMILIES DESCRIPTION
{NAME => 'test_cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER',
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =>
'65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.2240 seconds
https://community.hortonworks.com/questions/46350/how-much-actual-space-required-to-store-10gb-to-hd...
mentiones higher disk usage in hbase.
http://blog.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/
mentiones doubling of disk usage during compactions.
Could someone please clarify why the HBase table data is using nearly 9x the space in HDFS compared to the actual data being stored in the table?
What do I miss here?
Created 03-11-2018 12:01 AM
Created 03-11-2018 12:01 AM