Support Questions

Find answers, ask questions, and share your expertise

Tables regios stuck in RIT

avatar
Contributor
2024-05-04 15:27:08,590 WARN  [RpcServer.default.FPBQ.Fifo.handler=26,queue=1,port=16000] assignment.AssignmentManager: rit=OPENING, location=l230-n2.<SERVER>,16020,1714828981071, table=<NAMEAPACE>:<TABLE>, region=275bbfd7b7044afcba2deeeebf622b07 reported OPEN on server=l230-n2.<SERVER>,16020,1714828981071 but state has otherwise AND NO procedure is running

We have about 300 RIT and there alle stuck under the procedures & locks (<hbase master>:16010/procedures.jsp) we have some in state waiting_timeout.

We all ready tried restarting the cluster (hbase/zk) but got only whorse. There are 14 nodes in the cluster 3 are not assigned and we have a couple (about 4) that do all the work.

Where still on hbase 2.0.2  so we can't use hbck2 (2.0.3 >) and in hbase 2.0.2 hbck repair is disabled. Our average region count per server is about 330.

1. how do i get my regions unstuck?

 

15 REPLIES 15

avatar
Contributor

We tried decommissioning both busiest region servers, which had combined 80% of all regions. Decommsisioning fails because of the same Stuck In Transition. Luckily the side effect of decommissioning is all other regions did move to the remaining 12 servers, which makes that these two servers now have combined 3% of all regions. To our customers we have a much more balanced and therefor snappy cluster now. But as long as we have these Stuck In Transition, autobalancing will not happen, and at some point will have a imbalance again. 

It seems that the cause is that these 134 regions remain in OPENING, and should become OPEN, which is not going to happen. What is the best approach to mitigate this? Should we use hbck2 to set the region state manually to OPEN?

avatar
Super Collaborator

If the Region server reports that the region is already OPEN, try to scan the hbase:meta table from the hbase shell and check what state that region is in.

If its still in OPENING state in meta, try to change its state to OPEN. Do it for one of the region and see if that brings down the RIT count to 133.

avatar
Contributor

The current stat in hbase:meta is OPEN not OPENING. I can try setting it with HBCK2 to OPEN but i don't think this will change anything.

avatar
Super Collaborator

If the current stat in hbase:meta is OPEN, I won't suggest performing any other action to change the state in meta. I suspect there are some procedures running in the backend trying to assign the region again. Do you see any procedure as such in the "Procedure & locks" section under the Hbase Master Web UI?

avatar
Contributor

There are currently no locks/procedures.

avatar
Contributor

We still have 99 RIT's, how can we delete tables that are stuck?