Created 07-13-2016 01:21 AM
We have a table...
size of table is 8.7 GB number of regions was 5
we ran major compaction on table
size increased to 21.7 GB but in some time, size came down to 8.7 GB as earlier but number of regions increased from 5 to 27 and then came down to 17 and then never came down to 5 again
why is number of regions increased from 5 to 17 although size of data remains same??
Created 07-13-2016 02:36 PM
@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
split policy? Look at hbase.regionserver.region.split.policy
in hbase-site.xml.
Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases.
You can find more information in the HBase Book
Created 07-13-2016 01:53 AM
The size of a table, in bytes, is not necessarily tied to the number of regions. For example, a change in configuration might cause more or less regions for the same amount of data.
I don't have any definitive explanation why you saw the number of regions spike to 27; it might have just been transient. The number of regions likely increased from 5 to 17 due to splitting of the regions in this table as a part of the compaction.
You can investigate the RegionServer and Master logs on your cluster for the given table to understand if the regions underwent any splits. There are many reasons that the number of regions might have increased -- it is hard to definitively say why given the information you provided so far.
I would not be worried about 17 regions instead of only 5.
Created 07-13-2016 02:28 PM
Hi Josh
thanks for reply....
I have table in production that holds only 1TB of data and max.region.size is set to 10GB.
i will except regions in range of 100 - 200 for this dataset but i see that number of regions are ~800 for table.
----------
I created a demo table and shared results.....
in test....
Data size is 9GB less than max region size (10GB), with 5 regions.......
why region can grow to 5 in first place although 1 region was good enough? no pre-splitting of table was done
before major compaction, all 5 regions had data less than 10GB and no new data was added.... then why will major compaction will increase the number of regions? it should have only tried to merge Hfiles into 1 single Hfile for all 5 regions.
In formation or explanation will help...
thanks
Created 07-13-2016 02:36 PM
@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
split policy? Look at hbase.regionserver.region.split.policy
in hbase-site.xml.
Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases.
You can find more information in the HBase Book
Created 07-14-2016 12:07 AM
thanks guys for sharing thoughts
We were using IncreasingToUpperBoundRegionSplitPolicy and now changed it to ConstantSizeRegionSplitPolicy.
Above solved the mystery
thanks for help again!!
Created 04-15-2020 05:10 AM
I am using ConstantSizeRegionSplitPolicy and MaxFileSize is set to 30 GB.
But, I found that file is not split across regions when file size reaches 30 GB.
Some of my file size is 300 GB across particular regions.
Can you please help me to solve this probelm.
I have huge volume of data 10 TB.
Created 07-17-2016 11:04 PM
Thanks mqureshi for reference doc and that exactly whats happened....
............
I created new table and loaded data 6 times... which created single region for table with 6 hfiles....
total size of table was 24.2GB and 10GB region limit
ran major compaction and it created 12 new regions and deleted parent region.
............
looks like, when ever split is happening.... new regions are created by formula
my guess formula....
new regions added after split = ~(number of HFiles * 2) - 1 (original region is removed)
or
is there a actual way to get number of regions after split??