Support Questions
Find answers, ask questions, and share your expertise

HBase Regions In Transition

Explorer

HI all,
I m using HBase with HDP 3.1.0. Ambari shows that a total of 63 Regions are in transition. This RIT issue is causing problems for other dependent application. Is there a way to fix this without deleting any data/directory?
Also, hbck is not working with HBase 2.0.

Any help will be deeply appreciated.

Thank you!


Screenshot 2021-01-27 at 10.23.36 PM.png

1 ACCEPTED SOLUTION

Explorer

Hi @smdas 

Yes, we were able to bring region servers online and the RIT count was reduced to zero.
With the help of this thread Hbase HBCK2 I was able to create the HBCK2 jar. From the Hbase Master logs (or UI) I got the regions which were in transition and then simply looped through the list:

for i in `cat /apps/hbase/in_CLOSING.txt`; do hbase hbck -j /apps/hbase/hbase-hbck2-1.1.0-SNAPSHOT.jar -s setRegionState $i CLOSED; done;

View solution in original post

12 REPLIES 12

Super Collaborator

Hello @rajatsachan 

 

Thanks for using Cloudera Community. Based on the post, You have 2 issues:

(I) Region In Transition,

(II) HBCK Unavailability in HBase v2.x. 

 

For (I), RIT indicates the Regions are likely in OPENING, FAILED_OPEN, CLOSING, FAILED_CLOSE State. For any Region to move from say, RegionServer A to RegionServer B, the Region is Closed on RegionServer A before being opened on RegionServer B. Depending on the Region State, We need to check either RegionServer A or RegionServer B Logs to identify the reasoning for the Transition. Finally, Your Environment has ~1900 Regions per RegionServer, which is way beyond the general recommendations of 300/400 Regions per RegionServer. Such large Region/RegionServer Count would cause delay in WALSplit + WALEditReplay + RegionClosing or RegionOpening etc. 

 

For (II), KIndly review Link [1]. In short, HBase v2.x requires HBCK v2.x. You can use HBCK1 Read-Only Commands (Example: hbase hbck -details) on HBase v2.x yet any HBCK1 Fix Command won't work on HBase v2.x. You need to use the HBCK v2.x, which has to be built as described in the Link.

 

- Smarak

 

[1] https://community.cloudera.com/t5/Community-Articles/HBase-HBCK2-tool-for-HDP-3-x/ta-p/244386

Explorer

Thank you for the prompt reply @smdas !

When I run hbase hbck, it shows a lot of inconsistencies and two types of errors:

ERROR: Region { meta => CEMSample.pmebktidx,\xCF\x91L\x1B\xAC\xF9\x14\x9A,1602711756619.f99b0dd33f79a9320f3d1e0e663b2498., hdfs => hdfs://BIGMATCHTEST/apps/hbase/data/data/default/CEMSample.pmebktidx/f99b0dd33f79a9320f3d1e0e663b2498, deployed => , replicaId => 0 } not deployed on any region server.

ERROR: There is a hole in the region chain between \x0B\xA2\xE8\xBA.\x8B\xA2\xE8 and \x0C\xED\xE6$3\xB7\x98\x90. You need to create a new .regioninfo and region dir in hdfs to plug the hole.

I tried assigning the regions manually from the hbase shell but it's either taking a lot of time or getting hung up.

What should be done to decrease the Regions in transition?

Thanks!



Super Collaborator

Hello @rajatsachan 

 

Thanks for the details. For the Region which you are trying to assign via HBase Shell, We need to check the Logs (Same RS A & RS B depending on the Region State in Master UI). The Logs would confirm if the Assignment is failing owing to Region failing to Close successfully on RS A or Open successfully on RS B. Depending on the Region State, Check the RS Logs & Master Logs. 

 

Additionally, You mentioned the assignment is taking time or getting hung up. Each Command would have PID (Visible via "list_procedures" in HBase Shell or "Locks & Procedures" in HMaster UI). We can check if the PID is in RUNNABLE/WAITING/FAILED State. 

 

Note that you have ~2K Regions per RegionServer & RegionAssignment being Slow isn't unexpected. 

 

- Smarak

Explorer

Hi @smdas ,

The commands are mostly in the WAITING state:Screenshot 2021-01-28 at 1.49.12 PM.png

For the first Id, 1011134
RegionServer A logs:
Screenshot 2021-01-28 at 1.56.15 PM.png

Region Server B logs:
Screenshot 2021-01-28 at 1.56.52 PM.png

Both logs have similar errors and they are recurring too.
Does this help?
Thanks!

Explorer

@smdas 
Also from the HBase UI, the regions are in CLOSING state:
Screenshot 2021-01-28 at 2.46.02 PM.png

Super Collaborator

Hello @rajatsachan 

 

Thanks for the Update. The Screen-Shot showing Error can be ignored as they pertain to the Solr. The Regions are in CLOSING State i.e. the Regions are yet to Closed before they can be opened on a different RegionServer. The PID UI shows the Source RegionServer is "*2383*allstate*. We need to check the Master Logs & the "2383" RegionServer Logs for the Region ID & they should definitely show the reasoning for the CLOSING State. 

 

- Smarak

Explorer

Thank you for the reply @smdas 

I checked the Master Logs for the region in question and got this:
Screenshot 2021-01-28 at 3.21.36 PM.png

It mentions the table involved. Do we need to do something with this table? Disable it or something?


Super Collaborator

Hello @rajatsachan 

 

Based on the Master Trace shared by you, Check the RS "2385" Logs for the concerned RegionID. The RS Logs should provide additional details around the same.

 

- Smarak

Explorer

Hi @sainivedant41 

As a last resort, I used the HBCK2 tool to manually change the state of the regions (from CLOSING to CLOSED).
The supporting link is mentioned in this thread.
You can try it out as well.

Super Collaborator

Hello @rajatsachan 

 

Thanks for the Update concerning the post. Based on your Update, Your team used the HBCK2 Jar to move the RegionState to CLOSED. Post the concerned change, Were you able to successfully bring the Regions Online as well ? If Yes, Kindly confirm & share the steps to ensure fellow Community Users like @sainivedant41 can use the same as well. 

 

- Smarak

Explorer

Hi @smdas 

Yes, we were able to bring region servers online and the RIT count was reduced to zero.
With the help of this thread Hbase HBCK2 I was able to create the HBCK2 jar. From the Hbase Master logs (or UI) I got the regions which were in transition and then simply looped through the list:

for i in `cat /apps/hbase/in_CLOSING.txt`; do hbase hbck -j /apps/hbase/hbase-hbck2-1.1.0-SNAPSHOT.jar -s setRegionState $i CLOSED; done;

Super Collaborator

Hello @rajatsachan 

 

Thanks for sharing the details into the Steps used by you to resolve the issue. This would definitely assist fellow Community Members facing similar issues. If you have no further concerns, Kindly mark the Post as Solved as well. 

 

Thanks, Smarak

; ;