Support Questions

Find answers, ask questions, and share your expertise

HBase Regions stuck in FAILED_OPEN state

avatar
Expert Contributor

I have an HBase cluster which is having a major problem at the moment. My namespace and META tables appear to be working correctly. However the regions for my table are not being deployed on region servers. Instead they become stuck in the FAILED_OPEN state, often for longer than 20 minutes. Since they are classed as regions in transition balancing fails and cannot help. I have searched the log and there doesn't seem to be anything useful. I have tried the following:

  • hbase hbck -repair table_name
  • hbase hbck -repairHoles table_name
  • hbase hbck -fixMeta -fixAssignment table_name
  • assign region_name | hbase shell

None of these has helped. I have checked that HDFS is not corrupt, hdfs fsck / says it's healthy.

When running hbase hbck -details table_name, the only inconsistencies listed are the fact that the regions are not deployed. I saw a recommendation online and followed it, doing the following:

1. Stop HBase

2. Use a zookeeper cli and run "rmr /hbase" to delete the HBase znodes

3. Run offlineMetaRepair

4. Restart HBase. It will recreate the znodes

This still does not solve my problem. Is there anything more that anybody can suggest? I don't want to truncate the tables since we have over 2 months worth of data which we need to keep

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Can you find encoded region name for regions stuck in FAILED_OPEN state and pastebin related region server log ?

It would help us understand what caused the region not to open.

BTW which HDP version are you running ?

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

Can you find encoded region name for regions stuck in FAILED_OPEN state and pastebin related region server log ?

It would help us understand what caused the region not to open.

BTW which HDP version are you running ?

avatar
Expert Contributor

You inadvertently solved my problem. I had not seen that the HBase Master tells you which server it is trying to load to. I pulled up region server logs and found the following line:

org.apache.hadoop.security.AccessControlException: Permission denied

We had mistakenly changed the owner of /apps/hbase to hdfs, meaning that the hbase user could not write. We did hdfs dfs -chown -R hbase /apps/hbase and this has allowed the regions to be correctly assigned. Really appreciate your help.

For what it's worth, we're running HDP 2.4

avatar

check your master logs , search for region which is in transition and get the regionserver info where it is trying to open a region and check the logs of that regionserver.

If the logs are normal and you find all your regions (in transition) are failing to open on one regionserver only, then stop it and see if regions open properly