Created on 04-12-2018 06:40 AM - edited on 07-24-2023 08:50 AM by VidyaSargur
As we understood basic parameters of Hbase in Part 1 , lets try and understand some advanced parameters which should be tried after consultation with experts or by those who know what they are doing.
hbase.hstore.blockingStoreFiles
Default value of this parameter is 10. We know that each memstore flush creates an Hfile (hbase.hregion.memstore.flush.size), now the purpose this parameter serves is to send a message along the write pipeline that unless these many Hfiles are compacted using minor compaction, we should not go ahead with any more flushes. One would see messages in logs such as “Too many HFiles, delaying flush” . But like it says, it can only delay flush up to certain seconds, and even if the writes continue to happen, memstore could stretch only up to the size guided by :
hbase.hregion.memstore.flush.size X hbase.hregion.memstore.block.multiplier
Once this limit reaches , no more writes will be accepted by region server for this region and you will see messages like "org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit” in logs.
Situation will come under control and writes will resume once minor compaction gets over.One can always increase this parameter to avoid any potential issues during such heavy write traffic and could in turn make channels more productive.To help further, one could also increase hbase.hstore.compaction.max to a higher value so that more Hfiles are covered in compaction process. Lets discuss it below in details.
hbase.hstore.compaction.max
Default value is 10. Like I said above, under situations of heavy write load , you can tune this parameter and thus have minor compaction cover more Hfiles and help stuck write traffic resume. Please note that compaction itself has its own IO overhead so keep this in mind when you bump up this number.
hbase.hregion.max.filesize
Default value is 10 GB. Virtually can be used to control the rate of splitting of regions in Hbase. Once “any" one of the store (Column family) within a region reaches this value, the whole region would go for split. To disable splitting virtually , keep it to a very high number like ( 100 gb ) and set Splitting policy to be :
hbase.regionserver.region.split.policy = org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy
zookeeper.session.timeout
Default value is 90 seconds. We know that region server maintain a session with zookeeper server to remain active in cluster, for this each one of the region server has its own ephemeral znode in zookeeper. As soon as session gets disconnected or timed out for any reason, this znode get deleted and region server gets crashed. It is zookeeper.session.timeout which “partly” dictates negotiated session timeout with zookeeper server at the time of startup of region server. Now why “partly” ? This is because zookeeper server itself has a minimum and maximum session timeout defined for its clients like hbase, as per following formula :
Minimum session timeout : 2 X tick time
Maximum session timeout : 20 x tick time
Now, no matter what session timeout you set in your client configs, if your zookeeper server timeout is less than that, it would only negotiate that value with client. For example default tick time in zookeeper configuration is 2 seconds , which means maximum session timeout could not be bigger than 40 seconds, so no matter Hbase keeps its timeout to be 90 seconds, the negotiated value would only be 40 seconds. For increasing session timeouts in hbase so that your region servers could tolerate minor fluctuations coming out of GC pauses or due to network or any other transient problems either at hbase or zookeeper, consider increasing “tick time”.Tick time higher than 4 - 5 seconds is not recommended as it could impact the health of your zookeeper quorum.
hbase.rpc.timeout
Default value is 300000 ms. This is the timeout which a client gets for a RPC call which it makes to Hbase. Set it proportional to what your client / job / query requirements are. However don’t keep it too big that clients report issues and subsequently fail after a long time. When we talk about hbase.rpc.timeout, we talk about two more parameters in the same breath. They are right below.
hbase.client.scanner.timeout.period
Default value is 300000 ms.The time which any hbase client scanner gets to perform its task (scan) on hbase tables. It is a replacement of an earlier parameter called “hbase.regionserver.lease.period” . The difference here is, this timeout is specifically for RPCs that come from the HBase Scanner classes (e.g. ClientScanner) while hbase.rpc.timeout is the default timeout for any RPC call.Please also note that hbase.regionserver.lease.period parameter is deprecated now and this parameter replaces it. Thus looks like this parameter is going to take care of lease timeouts as well on scanners.
Rule of thumb is to keep hbase.rpc.timeout to be equal to or higher than hbase.client.scanner.timeout.period , this is because no matter scanner is doing its job scanning rows but if in between hbase.rpc.timeout expires , the client session would be expired.This could result in exceptions such as “ScannerTimeoutException or UnknownScannerException”.
However, there is one more parameter which comes at play when we discuss above timeouts and that I would discuss below:
hbase.client.scanner.caching
Default value is 100 rows. Basically this is the number of rows which a client scanner pulls from Hbase in one round ( before “scanner.next” could trigger ) and transfers back to the client. Multiple performance issues could arise (and in fact scanner timeout Exceptions as well ) if you set this value to a very high number as scanner would be burdened fetching these many rows and transferring them back to client , now assume region server or underlying HDFS or the network between client and server is slow for any reason, then it can very well lead to RPC session getting expired, which eventually leading failure of the job, messages like “ClosedChannel Exception” are seen when hbase tries to send back rows to the client whose session is already expired.
While Keeping a smaller count leaves cluster and scanner under utilized, higher number consumes resources such as region server heap and client memory in large quantity. Thus a good number is dependent upon how good resources like disk / memory / CPU you have on cluster nodes as well as how many million rows you need to scan. Go high if demands are high.