Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Number of regions increases after major compaction

Explorer

We have a table...

size of table is 8.7 GB number of regions was 5

we ran major compaction on table

size increased to 21.7 GB but in some time, size came down to 8.7 GB as earlier but number of regions increased from 5 to 27 and then came down to 17 and then never came down to 5 again

why is number of regions increased from 5 to 17 although size of data remains same??

1 ACCEPTED SOLUTION

@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy split policy? Look at hbase.regionserver.region.split.policy in hbase-site.xml.

Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases.

You can find more information in the HBase Book

View solution in original post

6 REPLIES 6

The size of a table, in bytes, is not necessarily tied to the number of regions. For example, a change in configuration might cause more or less regions for the same amount of data.

I don't have any definitive explanation why you saw the number of regions spike to 27; it might have just been transient. The number of regions likely increased from 5 to 17 due to splitting of the regions in this table as a part of the compaction.

You can investigate the RegionServer and Master logs on your cluster for the given table to understand if the regions underwent any splits. There are many reasons that the number of regions might have increased -- it is hard to definitively say why given the information you provided so far.

I would not be worried about 17 regions instead of only 5.

Explorer

Hi Josh

thanks for reply....

I have table in production that holds only 1TB of data and max.region.size is set to 10GB.

i will except regions in range of 100 - 200 for this dataset but i see that number of regions are ~800 for table.

----------

I created a demo table and shared results.....

in test....

Data size is 9GB less than max region size (10GB), with 5 regions.......

why region can grow to 5 in first place although 1 region was good enough? no pre-splitting of table was done

before major compaction, all 5 regions had data less than 10GB and no new data was added.... then why will major compaction will increase the number of regions? it should have only tried to merge Hfiles into 1 single Hfile for all 5 regions.

In formation or explanation will help...

thanks

@sunny malik, are you still using the org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy split policy? Look at hbase.regionserver.region.split.policy in hbase-site.xml.

Try using the ConstantSizeRegionSplitPolicy instead which will only split at 10GB. The IncreasingToUpperBoundRegionSplitPolicy will split more aggressively in the beginning, slowing down to larger regions as the number of regions for the table increases.

You can find more information in the HBase Book

Explorer

thanks guys for sharing thoughts

We were using IncreasingToUpperBoundRegionSplitPolicy and now changed it to ConstantSizeRegionSplitPolicy.

Above solved the mystery

thanks for help again!!

New Contributor

I am using ConstantSizeRegionSplitPolicy and MaxFileSize is set to 30 GB.

But, I found that file is not split across regions when file size reaches 30 GB.

Some of my file size is 300 GB across particular regions.

Can you please help me to solve this probelm. 

I have huge volume of data 10 TB.

Explorer

Thanks mqureshi for reference doc and that exactly whats happened....

............

I created new table and loaded data 6 times... which created single region for table with 6 hfiles....

total size of table was 24.2GB and 10GB region limit

ran major compaction and it created 12 new regions and deleted parent region.

............

looks like, when ever split is happening.... new regions are created by formula

my guess formula....

new regions added after split = ~(number of HFiles * 2) - 1 (original region is removed)

or

is there a actual way to get number of regions after split??