Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to compute regionserver's normal region count?

avatar
Rising Star

hello everyone:

I Want to know how many rgions on every regionserver is normal?(hbase version :3.2.7.964)

I find one formula in hbase guide is

((RS Xmx) * hbase.regionserver.global.memstore.size) / (hbase.hregion.memstore.flush.size * (# column families))

so my regionserver Physical memory is 14G,region server heap is 8G, memstore.size is 0.4, flush.size is 128M. cfs is 1

so my regionserver normal region count is 8*1024*0.4/128*1=25.6

but this Does not conform to the actual situation,my regionserver have 2000 region ,read and write normal.

why?

how to compute regionserver's normal region count?

1 ACCEPTED SOLUTION

avatar

Hello Pan

This question is about node ressources and data per region. Not reallu sure what your other configuratiosn like handlers or GC or cache or region replicas are so a little in the dark. The usual formula is

(RS memory)*(total memstore fraction)/((memstore size)*(# column families))

This calculation is really about guidelines not a hard truth because it will also depend of actual load and query pattern.Your Regionserver can very well hold much more regionservers but by definition get much more writes since it has the responsibility of more regions. As such it will buffer and flush very often, under heavy load you are prone to having big flush,compaction issues and probably eventually region servers going down because non responsive. Again if out of the 2000 region servers only a couple are actually active it is not as critical, still not a good pattern. Same on the read side if you look at the amount of memory allocated for the cache with that many regions if they are often used you will end up going to disk very often and result in poor read performance. you could look at your hit miss ratio to see how your regions servers go down. Lastly with that kind of distribution if one region server goes down your overall loss is probably very big so not ideal for recovery purposes. Overall 100-200 Regions per RS seems a decent high ball park, depending on ressources too much outside will need some tuning and monitoring.

Hope this sheds some light

View solution in original post

2 REPLIES 2

avatar

Hello Pan

This question is about node ressources and data per region. Not reallu sure what your other configuratiosn like handlers or GC or cache or region replicas are so a little in the dark. The usual formula is

(RS memory)*(total memstore fraction)/((memstore size)*(# column families))

This calculation is really about guidelines not a hard truth because it will also depend of actual load and query pattern.Your Regionserver can very well hold much more regionservers but by definition get much more writes since it has the responsibility of more regions. As such it will buffer and flush very often, under heavy load you are prone to having big flush,compaction issues and probably eventually region servers going down because non responsive. Again if out of the 2000 region servers only a couple are actually active it is not as critical, still not a good pattern. Same on the read side if you look at the amount of memory allocated for the cache with that many regions if they are often used you will end up going to disk very often and result in poor read performance. you could look at your hit miss ratio to see how your regions servers go down. Lastly with that kind of distribution if one region server goes down your overall loss is probably very big so not ideal for recovery purposes. Overall 100-200 Regions per RS seems a decent high ball park, depending on ressources too much outside will need some tuning and monitoring.

Hope this sheds some light

avatar
Rising Star

I Hear through tools can find which region is used in regionserver ?Did you know about tool? what's name about tool?

I think My region server show 3000-4000 region (8G Heap), May be only part region is common used

other 3000 region 3000*2M =6G this Impossible