09-01-2018 05:59 AM
For a small table (less than 1 million rows and around 1 or 2GB size) that will be accessed by a lot of jobs in parallel (insertions and range scan), what is the best way to create this table in terms of splitting ?
The goal is to have good performance and low latency on requests and to avoid hotspotting.
Max size of regions in my cluster is 20GB and there are 20 nodes with regionservers.
09-03-2018 03:48 AM
20 splits for a so small table, is that the right practice ? I mean, we used to have a lot of splits by table before, each split was very small (between some kb to hundreds of Mb) and the result was a lot of regions by regionserver and a huge load in Hbase.
So we merged regions when it was possible, based on region size max set in Hbase to recover an acceptable load.
This is why I'm questionning about splitting this new table into so many splits. Can you confirm me please ?