We're seeing frequent hbase region not serving exception with below error across different region servers [cluster is of 35 regions] - Major compaction is disabled for the envrionment - There are no offpeak and on peak hours added to the config.
ast exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException
Params in our config are -
hbase.hstore.compactionThreshold - 3 , store files has been set to 200.. hstore.blockingWaitTime - 5 sec
Here's our application logic which would get the value from specific key and then put to the table using same key.
Please let us know if any other ideas to resolve this issue. Thanks in advance.
NotServingRegionException indicates that the queried region is not ONLINE anywhere in the cluster meaning none of the Regionservers serve the region. However, the 'hbase:meta' table is somehow containing an invalid record pointing to a specific RegionServer which actually do not host the region.
If at all the region is staying in transition state (like FAILED_CLOSE or FAILED_OPEN) for any reason, it should have been highlighted by the HMaster in it's WebUI (HMaster WebUI > Search for "Regions in Transition" section).
If nothing is reported for this region by HMaster WebUI, then you can simply try assigning the region (command: "assign '<REGION_ENCODED_NAME>'") from an hbase shell (launch hbase shell as 'hbase' user or user with sufficient privilege) and see if any of the RS can open it. If the region assignment fails, review the corresponding RS logs and investigate the reason (Active HMaster logs will help to know the RS where the region assignment was placed)
@Lingesh This is not specific to one region. This happens diffferent ones at different time. Region is getting offlined midstream or marks it as not serving exception.
There are no inconsistencies reported in hbck - Even hbase master doesn't show any exceptions or errors related to it.
What would be the best approach in our case ? Please advise.
We are facing similar issue in our environment and this is happening almost all the time.
A region is being splitted from its parent and is in post deploy state and at the very same timestamp hbase reads and writes are failing.
Regions either split or being moved to new region server and until that region is opened, all reads and writes keep on failing.
Are there any configurations that needs to be modifed to avoid these.
Thanks in advance.
Normally Region splits in HBase is lightweight (the major delay could be attributed by a reference link file creation RPC call to Namenode) and hence should be pretty fast unless NN is undergoing performance issue. If client access this region during this timeframe, it may experience the said exception but that should be transient, transparent and non-fatal to the client application. Do you see any fatal errors at your client application? Do you have customized retry attempts in your client?