Member since
07-13-2016
7
Posts
2
Kudos Received
0
Solutions
04-15-2020
05:10 AM
I am using ConstantSizeRegionSplitPolicy and MaxFileSize is set to 30 GB. But, I found that file is not split across regions when file size reaches 30 GB. Some of my file size is 300 GB across particular regions. Can you please help me to solve this probelm. I have huge volume of data 10 TB.
... View more
08-19-2016
12:12 AM
1 Kudo
Yes, with IncreasingToUpeerBoundRegionSplitPolicy it is possible to have a split of a region which is far from maximum size - this is expected behavior. The reason why? HBase tries to create many regions while they are small and distribute them across the cluster. You will need to switch to ConstantSizeRegionSplitPolicy if you do not want this. hbase.regionserver.region.split.policy controls the setting per HBase table.
... View more
07-17-2016
09:31 PM
I would also say that you should be able to understand the data which you are loading to make sure that you are creating reasonable split points. Even if the keys are hashed, you should be able to understand what the first byte/character of the rowKey is and create reasonable split points (using RegionSplitter or by hand).
... View more