Created 07-17-2016 08:28 PM
I have created Hbase table using below commands and splitted table into 20
hbase org.apache.hadoop.hbase.util.RegionSplitter test_rec_a UniformSplit -c 20 -f rec
alter 'test_rec_a', {METHOD => 'table_att', CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}, {NAME=>'rec', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>'5184000', BLOCKSIZE => '65536', IN_MEMORY => 'true', BLOCKCACHE => 'true',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => 'false'}
enable 'test_rec_a'
hbase config: max region size: 10GB
I ran 3 bulk load jobs with 2, 9, 80 GB file ... files has all unique keys
I was expecting that job run and load data in all 20 regions but loaded data in single region only is there something i am missing here??
i am looking to pre-split table into 20 regions but i don't know keys distribution as keys are hashed.
is there a way to pre-split without knowing key distribution or not to pre-split is the right option??
thanks
Created 07-17-2016 08:59 PM
First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.
Created 07-17-2016 08:59 PM
First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.
Created 07-17-2016 09:31 PM
I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).