Created on 01-27-2021 09:03 AM - last edited on 01-27-2021 09:45 PM by VidyaSargur
HI all,
I m using HBase with HDP 3.1.0. Ambari shows that a total of 63 Regions are in transition. This RIT issue is causing problems for other dependent application. Is there a way to fix this without deleting any data/directory?
Also, hbck is not working with HBase 2.0.
Any help will be deeply appreciated.
Thank you!
Created 02-02-2021 05:54 AM
Hi @smdas
Yes, we were able to bring region servers online and the RIT count was reduced to zero.
With the help of this thread Hbase HBCK2 I was able to create the HBCK2 jar. From the Hbase Master logs (or UI) I got the regions which were in transition and then simply looped through the list:
for i in `cat /apps/hbase/in_CLOSING.txt`; do hbase hbck -j /apps/hbase/hbase-hbck2-1.1.0-SNAPSHOT.jar -s setRegionState $i CLOSED; done;
Created 01-27-2021 10:12 PM
Hello @rajatsachan
Thanks for using Cloudera Community. Based on the post, You have 2 issues:
(I) Region In Transition,
(II) HBCK Unavailability in HBase v2.x.
For (I), RIT indicates the Regions are likely in OPENING, FAILED_OPEN, CLOSING, FAILED_CLOSE State. For any Region to move from say, RegionServer A to RegionServer B, the Region is Closed on RegionServer A before being opened on RegionServer B. Depending on the Region State, We need to check either RegionServer A or RegionServer B Logs to identify the reasoning for the Transition. Finally, Your Environment has ~1900 Regions per RegionServer, which is way beyond the general recommendations of 300/400 Regions per RegionServer. Such large Region/RegionServer Count would cause delay in WALSplit + WALEditReplay + RegionClosing or RegionOpening etc.
For (II), KIndly review Link [1]. In short, HBase v2.x requires HBCK v2.x. You can use HBCK1 Read-Only Commands (Example: hbase hbck -details) on HBase v2.x yet any HBCK1 Fix Command won't work on HBase v2.x. You need to use the HBCK v2.x, which has to be built as described in the Link.
- Smarak
[1] https://community.cloudera.com/t5/Community-Articles/HBase-HBCK2-tool-for-HDP-3-x/ta-p/244386
Created 01-27-2021 11:42 PM
Thank you for the prompt reply @smdas !
When I run hbase hbck, it shows a lot of inconsistencies and two types of errors:
ERROR: Region { meta => CEMSample.pmebktidx,\xCF\x91L\x1B\xAC\xF9\x14\x9A,1602711756619.f99b0dd33f79a9320f3d1e0e663b2498., hdfs => hdfs://BIGMATCHTEST/apps/hbase/data/data/default/CEMSample.pmebktidx/f99b0dd33f79a9320f3d1e0e663b2498, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: There is a hole in the region chain between \x0B\xA2\xE8\xBA.\x8B\xA2\xE8 and \x0C\xED\xE6$3\xB7\x98\x90. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
I tried assigning the regions manually from the hbase shell but it's either taking a lot of time or getting hung up.
What should be done to decrease the Regions in transition?
Thanks!
Created 01-28-2021 12:04 AM
Hello @rajatsachan
Thanks for the details. For the Region which you are trying to assign via HBase Shell, We need to check the Logs (Same RS A & RS B depending on the Region State in Master UI). The Logs would confirm if the Assignment is failing owing to Region failing to Close successfully on RS A or Open successfully on RS B. Depending on the Region State, Check the RS Logs & Master Logs.
Additionally, You mentioned the assignment is taking time or getting hung up. Each Command would have PID (Visible via "list_procedures" in HBase Shell or "Locks & Procedures" in HMaster UI). We can check if the PID is in RUNNABLE/WAITING/FAILED State.
Note that you have ~2K Regions per RegionServer & RegionAssignment being Slow isn't unexpected.
- Smarak
Created on 01-28-2021 12:30 AM - last edited on 01-28-2021 11:54 PM by K23
Hi @smdas ,
The commands are mostly in the WAITING state:
For the first Id, 1011134
RegionServer A logs:
Region Server B logs:
Both logs have similar errors and they are recurring too.
Does this help?
Thanks!
Created 01-28-2021 01:16 AM
@smdas
Also from the HBase UI, the regions are in CLOSING state:
Created 01-28-2021 01:36 AM
Hello @rajatsachan
Thanks for the Update. The Screen-Shot showing Error can be ignored as they pertain to the Solr. The Regions are in CLOSING State i.e. the Regions are yet to Closed before they can be opened on a different RegionServer. The PID UI shows the Source RegionServer is "*2383*allstate*. We need to check the Master Logs & the "2383" RegionServer Logs for the Region ID & they should definitely show the reasoning for the CLOSING State.
- Smarak
Created 01-28-2021 01:55 AM
Thank you for the reply @smdas
I checked the Master Logs for the region in question and got this:
It mentions the table involved. Do we need to do something with this table? Disable it or something?
Created 01-28-2021 07:56 AM
Hello @rajatsachan
Based on the Master Trace shared by you, Check the RS "2385" Logs for the concerned RegionID. The RS Logs should provide additional details around the same.
- Smarak
Created 02-01-2021 04:11 AM
Hi @sainivedant41
As a last resort, I used the HBCK2 tool to manually change the state of the regions (from CLOSING to CLOSED).
The supporting link is mentioned in this thread.
You can try it out as well.