Created 08-29-2016 04:36 PM
Our cluster recently had some issue related to network outages.
When all the dust settled, Hbase eventually "healed" itself, and almost everything is back to working well, with a couple of exceptions.
In particular, we have one table where almost every query times out - which was never the case before. It's very small compared to most of our other tables at around 400 million rows.
(Clarification: we query via JDBC via Phoenix)
When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats) it shows '1' under "offline regions" for that table (it has 33 total regions). Almost all the other tables show '0'.
Can anyone help me troubleshoot this?
I know there is a CLI tool for fixing HBase issues. I'm wondering whether that "offline region" is the cause of these timeouts.
If not, how I can I figure it out?
Thanks!
Created 08-29-2016 04:44 PM
You can use Master UI to find which region is offline.
To troubleshoot root cause, please share the master log and the region that is offline.
Created 08-29-2016 05:47 PM
Zack:
You can use hfile tool to inspect:
MY_BROKEN_TABLE/8a444fa1979524e97eb002ce8aa2d7aa/0/4f9a5c26ddb0413aa4eb64a869ab4a2c
Created 08-29-2016 05:49 PM
Zack:
Can you check other regions which failed to open (such as
a97029c18889b3b3168d11f910ef04ae
) ?
Created 06-26-2017 06:49 AM
I think you need check the folder access. There had two place you need check: `/var/log/hbase` and `/hadoop/hbase/local/jars/tmp/`. Also I had chown those folders under hbase the region start success. Try it and congratulate。