I have read online (http://hbase.apache.org/0.94/book/important_configurations.html#bigger.regions) that having too many regions is also cause of poor latencies. Has anyone seen this?
- Per my understanding, hbase.hregion.max.filesize decides how large are the regions and smaller this is set to, larger will be the number of regions as region splitting will be more frequent. Is this setting used only on region server or also on master server? What does hbase master server used hbase.hregion.max.filesize for?
- The hbase documentation points to an online merge utility(online_merge.rb) attached to (https://issues.apache.org/jira/browse/HBASE-1621) to do the online merges. Does anyone have experience in using this tool?
> that having too many regions is also cause of poor latencies. Has anyone seen this?
You could go slower with too many regions as a result of more # of connection overheads required to find/scan data across the table. But smaller regions can also do faster random reads. So the first question should be: Latency of what? Scans? Gets? Puts?
> Is this setting used only on region server or also on master server?
Its used only by the RS (splits happen at RSes). The master does load the property to sanity-check table descriptors though, but does not actually use the values to work on splits.
> Does anyone have experience in using this tool?
CDH5 has a more direct command you can use: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#_online_region_merges