Reply
New Contributor
Posts: 2
Registered: ‎09-01-2018

Hbase splitting for a small table

Hi,

 

For a small table (less than 1 million rows and around 1 or 2GB size) that will be accessed by a lot of jobs in parallel (insertions and range scan), what is the best way to create this table in terms of splitting ?

 

The goal is to have good performance and low latency on requests and to avoid hotspotting.

 

Max size of regions in my cluster is 20GB and there are 20 nodes with regionservers.

 

Thanks.

Master
Posts: 430
Registered: ‎07-01-2015

Re: Hbase splitting for a small table

Make sure you split the table to at least 20 chunks. And keep the split stable (i.e. design it in a way that the data will not "drift" into a skew).
Highlighted
New Contributor
Posts: 2
Registered: ‎09-01-2018

Re: Hbase splitting for a small table

20 splits for a so small table, is that the right practice ? I mean, we used to have a lot of splits by table before, each split was very small (between some kb to hundreds of Mb) and the result was a lot of regions by regionserver and a huge load in Hbase.

 

So we merged regions when it was possible, based on region size max set in Hbase to recover an acceptable load.

 

This is why I'm questionning about splitting this new table into so many splits. Can you confirm me please ?