Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

HBase - Region in Transition

avatar
Contributor

HBase keep having region in transition: 

 

Regions in Transition

Region State RIT time (ms)

1588230740hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Thu Apr 23 12:15:49 ICT 2015 (8924s ago), server=02slave.mabu.com,60020,14297655798238924009
Total number of Regions in Transition for more than 60000 milliseconds1 
Total number of Regions in Transition1

 

I've try "sudo -u hbase hbase hbck -repair" and also "unassign 'hbase:meta,,1.1588230740'" but still can't fix the problem.

1 ACCEPTED SOLUTION

avatar
Community Manager

1. Stop HBase
2. Move your original /hbase back into place
3. Use a zookeeper cli such as "hbase zkcli"[1] and run "rmr /hbase" to delete the HBase znodes
4. Restart HBase. It will recreate the znodes

 

If Hbase fails to start after this, you can always try the offline Meta repair:
hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

 

Also check for inconsistencies after HBase is up.  As the hbase user, run "hbase hbck -details". If there are inconsistencies reported, normally I would use the "ERROR" messages from the hbck output to decide on the best repair method, but since you were willing to start over just run "hbase hbck -repair".

 

If the above fails, you can always try the offline Meta repair:

hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

 

[1] http://hbase.apache.org/book.html#trouble.tools
[2] http://hbase.apache.org/book.html#hbck.in.depth



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

View solution in original post

18 REPLIES 18

avatar
Contributor

Sounds like a bug, check the bugs reported about transitions and compare with the version you are using.

If repair doesn't work, the only solution I see is to do a snapshot, truncate the table and then import the snapshot (maybe try to import into another table before you truncate the main one).

avatar
Contributor

Thanks for reply Gonzalo,

I can't do anything with hbase right now (can't disable, drop table, can't event view sample table on hue), everything just stuct, so i just manually delete all table data on HDFS, just keep the default(sample) table, but it still won't work.

avatar
Contributor

If restarting the master din't do anything I would say the hbase znode is messed up in zookeeper.

If you have nothing to lose stop hbase, delete the znode in zookeeper, delete the hbase folder in hdfs and start hbase.

avatar
Contributor

Thanks Gonzalo,

 

I've been try to delete the "/hbase" folder ( just move it to /tmp), but when i restart Hbase, the hbase master can't start, as i remember is because of user authentication of "/" belong to HDFS, and i don't want to chown "/" to hbase.

Even after i create a new "/hbase" and chow it to hbase:hbase, the hbase master still won't start unless i move back the old "/hbase"

 

About znode in zookeeper, i really don't know much about it, i just know my ZooKeeper Znode Parent is "/hbase", do i just delete this folder or i have to delete something elsse ? 

avatar
Community Manager

1. Stop HBase
2. Move your original /hbase back into place
3. Use a zookeeper cli such as "hbase zkcli"[1] and run "rmr /hbase" to delete the HBase znodes
4. Restart HBase. It will recreate the znodes

 

If Hbase fails to start after this, you can always try the offline Meta repair:
hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

 

Also check for inconsistencies after HBase is up.  As the hbase user, run "hbase hbck -details". If there are inconsistencies reported, normally I would use the "ERROR" messages from the hbck output to decide on the best repair method, but since you were willing to start over just run "hbase hbck -repair".

 

If the above fails, you can always try the offline Meta repair:

hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

 

[1] http://hbase.apache.org/book.html#trouble.tools
[2] http://hbase.apache.org/book.html#hbck.in.depth



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Contributor

Thanks denloe,

 

The command "hbase hbck -repair" solve 6 inconsistencies. And the step that delete znodes make this command working, i think, because i've been trying "hbase hbck -repair" before but it just stuck. HBase working fine now, thank you.

avatar
Explorer

I have tried your steps, but I still have inconsistancies and hbck -repair does not work. My inconsistancies are with data tables and not with META.

I get the following error message:

 

INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned

 

and it eventually times out. I am using CDH 5.4.4 with Hbase 1.0.0. I cannot do anything on Hbase (count, scan etc.). 

avatar
Contributor

@EugeneM wrote:

I have tried your steps, but I still have inconsistancies and hbck -repair does not work. My inconsistancies are with data tables and not with META.

I get the following error message:

 

INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned

 

and it eventually times out. I am using CDH 5.4.4 with Hbase 1.0.0. I cannot do anything on Hbase (count, scan etc.). 


 

 

In your case, if "hdfs fsck" doesn't fix the files; you are going to have to delete the corrupted hdfs table files.

If you can load the data again, probably the best thing is to delete the /hbase directory in hdfs altogether, restart and load the data again.

avatar
Explorer

Thank you Gonzalo. That was exactly what I was not hoping to hear, but seeing that I could not use hbase, I had to do it. So, after deleting all the files/tables with inconsistancies, I got hbase to work again. Thank you for your help.

avatar
Contributor

When you run OfflineMetaRepair, most likely you will run it from your userid or root.  Then we may get some opaque errors like "java.lang.AbstractMethodError: org.apache.hadoop.hbase.ipc.RpcScheduler.getWriteQueueLength()". 

 

If you check in HDFS, you may see that the meta directory is no longer owned by hbase:

 

$ hdfs dfs -ls /hbase/data/hbase/
Found 2 items
drwxr-xr-x   - root  hbase          0 2017-09-12 13:58 /hbase/data/hbase/meta
drwxr-xr-x   - hbase hbase          0 2016-06-15 15:02 /hbase/data/hbase/namespace

Manually chown -R it and restart HBase fixed it for me.

 

 

avatar
Explorer

I have done 4 different upgrades (on 4 different clusters) and I get this error everytime. I have to wipe out /hbase and lose the data which is the exact opposite reason on why I am doing an upgrade. 

 

There must be a step missing from the upgrade instructions, I have followed them each time. 

 

I tried your solution here, and it doesn't work. 

 

When I try to restart HBase the master failes and I get this: 

 

 

Failed to become active master
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded. You have version null and I want version 8. Consult http://hbase.apache.org/book.html for further information about upgrading HBase. Is your hbase.rootdir valid? If so, you may need to run 'hbase hbck -fixVersionFile'.

 

I run hbase hdck -fixVersionFile and it gets stuck on 

5/09/28 20:51:58 INFO client.RpcRetryingCaller: Call exception, tries=14, retries=35, started=128696 ms ago, cancelled=false, msg=

 

 

avatar
New Contributor

If you delete the /hbase directory in zookeeper, you might be able to keep the data.

avatar
Explorer

"If you delete the /hbase directory in zookeeper, you might be able to keep the data."

 

Thanks for response. I am not sure how I delete this just in zookeeper. Is there a command for that? 

avatar
Contributor

You have to use the command line.

Should be something like this:

 

#Start the command line and connect to any of the zk servers

# if you are not using CDH then the command is zk-cli.sh

#if your clluster is kerberized you need to kinit before, otherwise the delete will fail

zookeeper-client -server localhost:2181

 

#Once in the shell run this to delete the directory with metadata

rmr /hbase

 

avatar
Master Collaborator

Hey everyone, this is a great thread and I might be showing my "HBase age" here with old advice, but unless something has changed in recent versions of HBase, you cannot use these steps if you are using HBase replication.  

 

The replication counter which stores the progress of your synchronization between clusters is stored as a znode under /hbase/replication in Zookkeeper, so you'll completely blow away your replication if you do an "rmr /hbase".

 

Please be super careful with these instructions.  And to answer @Amanda 's question in this thread about why this happens with each upgrade, this RIT problem usually appears if HBase was not cleanly shut down.  Maybe you're trying to upgrade or move things around while HBase is still running?

avatar
Explorer

HI there Clint-

 

What you would suggest be done when HBase gets a region stuck in transition? I am all ears! 

 

Thanks! 

 

Amanda 

avatar
Master Collaborator

Well, it's been a couple years since I supported HBase, but what we used to do is delete all the znodes in the /hbase directory in ZK EXCEPT for the /hbase/replication dir.  You just have to be a little more surgical with what you're deleting in that RIT situation, IF you're using the Hbase replication feature to back your cluster up to a secondary cluster.  If not, the previous advice is fine.

 

Ultimately, regions should not get stuck in transition, though.  What version of HBase are you running?  We used to have tons of bugs in older versions that would cause this situation, but those should be resolved long ago.

avatar
Contributor

Nowadays there is a "clean" operation in the shell admin utilities that can be used to remove data files, zk data or both.

I guess that tool has in consideration what you are pointing out

Labels