Support Questions

Find answers, ask questions, and share your expertise

How to fix "offline regions" in HBase

avatar
Super Collaborator

Our cluster recently had some issue related to network outages.

When all the dust settled, Hbase eventually "healed" itself, and almost everything is back to working well, with a couple of exceptions.

In particular, we have one table where almost every query times out - which was never the case before. It's very small compared to most of our other tables at around 400 million rows.

(Clarification: we query via JDBC via Phoenix)

When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats) it shows '1' under "offline regions" for that table (it has 33 total regions). Almost all the other tables show '0'.

Can anyone help me troubleshoot this?

I know there is a CLI tool for fixing HBase issues. I'm wondering whether that "offline region" is the cause of these timeouts.

If not, how I can I figure it out?

Thanks!

13 REPLIES 13

avatar
Contributor

You can use Master UI to find which region is offline.

To troubleshoot root cause, please share the master log and the region that is offline.

avatar
Master Collaborator

Zack:

You can use hfile tool to inspect:

MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c

http://hbase.apache.org/book.html#hfile_tool

avatar
Master Collaborator

Zack:

Can you check other regions which failed to open (such as

a97029c18889b3b3168d11f910ef04ae

) ?

avatar
Contributor

I think you need check the folder access. There had two place you need check: `/var/log/hbase` and `/hadoop/hbase/local/jars/tmp/`. Also I had chown those folders under hbase the region start success. Try it and congratulate。