Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Pre-splitting Hbase table not working

avatar
New Member

I have created Hbase table using below commands and splitted table into 20

hbase org.apache.hadoop.hbase.util.RegionSplitter test_rec_a UniformSplit -c 20 -f rec

alter 'test_rec_a', {METHOD => 'table_att', CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}, {NAME=>'rec', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>'5184000', BLOCKSIZE => '65536', IN_MEMORY => 'true', BLOCKCACHE => 'true',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => 'false'}

enable 'test_rec_a'

hbase config: max region size: 10GB

I ran 3 bulk load jobs with 2, 9, 80 GB file ... files has all unique keys

I was expecting that job run and load data in all 20 regions but loaded data in single region only is there something i am missing here??

i am looking to pre-split table into 20 regions but i don't know keys distribution as keys are hashed.

is there a way to pre-split without knowing key distribution or not to pre-split is the right option??

thanks

1 ACCEPTED SOLUTION

avatar
Rising Star

First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.

View solution in original post

2 REPLIES 2

avatar
Rising Star

First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.

avatar
Super Guru

I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).