Created on 05-04-2024 11:52 PM - edited 05-05-2024 01:01 AM
2024-05-04 15:27:08,590 WARN [RpcServer.default.FPBQ.Fifo.handler=26,queue=1,port=16000] assignment.AssignmentManager: rit=OPENING, location=l230-n2.<SERVER>,16020,1714828981071, table=<NAMEAPACE>:<TABLE>, region=275bbfd7b7044afcba2deeeebf622b07 reported OPEN on server=l230-n2.<SERVER>,16020,1714828981071 but state has otherwise AND NO procedure is running
We have about 300 RIT and there alle stuck under the procedures & locks (<hbase master>:16010/procedures.jsp) we have some in state waiting_timeout.
We all ready tried restarting the cluster (hbase/zk) but got only whorse. There are 14 nodes in the cluster 3 are not assigned and we have a couple (about 4) that do all the work.
Where still on hbase 2.0.2 so we can't use hbck2 (2.0.3 >) and in hbase 2.0.2 hbck repair is disabled. Our average region count per server is about 330.
1. how do i get my regions unstuck?
Created 05-10-2024 12:16 AM
We tried decommissioning both busiest region servers, which had combined 80% of all regions. Decommsisioning fails because of the same Stuck In Transition. Luckily the side effect of decommissioning is all other regions did move to the remaining 12 servers, which makes that these two servers now have combined 3% of all regions. To our customers we have a much more balanced and therefor snappy cluster now. But as long as we have these Stuck In Transition, autobalancing will not happen, and at some point will have a imbalance again.
It seems that the cause is that these 134 regions remain in OPENING, and should become OPEN, which is not going to happen. What is the best approach to mitigate this? Should we use hbck2 to set the region state manually to OPEN?
Created 05-10-2024 07:09 AM
If the Region server reports that the region is already OPEN, try to scan the hbase:meta table from the hbase shell and check what state that region is in.
If its still in OPENING state in meta, try to change its state to OPEN. Do it for one of the region and see if that brings down the RIT count to 133.
Created on 05-15-2024 02:22 AM - edited 05-15-2024 02:23 AM
The current stat in hbase:meta is OPEN not OPENING. I can try setting it with HBCK2 to OPEN but i don't think this will change anything.
Created 05-15-2024 05:29 AM
If the current stat in hbase:meta is OPEN, I won't suggest performing any other action to change the state in meta. I suspect there are some procedures running in the backend trying to assign the region again. Do you see any procedure as such in the "Procedure & locks" section under the Hbase Master Web UI?
Created 05-16-2024 12:21 AM
There are currently no locks/procedures.