Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pre-splitting Hbase table not working

avatar
Explorer

I have created Hbase table using below commands and splitted table into 20

hbase org.apache.hadoop.hbase.util.RegionSplitter test_rec_a UniformSplit -c 20 -f rec

alter 'test_rec_a', {METHOD => 'table_att', CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}, {NAME=>'rec', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>'5184000', BLOCKSIZE => '65536', IN_MEMORY => 'true', BLOCKCACHE => 'true',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => 'false'}

enable 'test_rec_a'

hbase config: max region size: 10GB

I ran 3 bulk load jobs with 2, 9, 80 GB file ... files has all unique keys

I was expecting that job run and load data in all 20 regions but loaded data in single region only is there something i am missing here??

i am looking to pre-split table into 20 regions but i don't know keys distribution as keys are hashed.

is there a way to pre-split without knowing key distribution or not to pre-split is the right option??

thanks

1 ACCEPTED SOLUTION

avatar
Rising Star

First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.

View solution in original post

2 REPLIES 2

avatar
Rising Star

First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.

avatar
Super Guru

I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).