Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Zombie HBase Table

avatar
New Contributor

HBase Version: 1.2.0

 

I had a truncate table with splits hang and had to kill the session. Afterwards the HBase master reported the below:

 

2020-10-21 11:51:39,428 INFO org.apache.hadoop.hbase.master.HMaster: Client=research//10.19.25.18 truncate table
020-10-21 11:51:39,876 INFO org.apache.hadoop.hbase.MetaTableAccessor: Deleted [{ENCODED => 190c395e0419552552ec2472c212109b, NAME => ' '
020-10-21 12:55:49,600 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.tmp/data/default/table_name/b533d42b2b0da96cd7f960619b8ce6f1/.regioninfo (inode 1164868456): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_-316682701_1, pending creates: 3]

2020-10-21 12:55:49,626 WARN org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure: Retriable error trying to truncate table='' state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.tmp/data/default/''/b533d42b2b0da96cd7f960619b8ce6f1/.regioninfo (inode 1164868456): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_-316682701_1, pending creates: 3]

2020-10-21 13:44:58,519 WARN org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure: Retriable error trying to truncate table=''state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://nameservice1/hbase/.tmp/data/default/''eef3595091cdf51af7488dca37398617

 

 

The HBase master is now throwing the same error repeatedly over and over again since past two days.

 

 

hbase(main):001:0> list_procedures
 Id Name State Start_Time Last_Update
 15357 TruncateTableProcedure (table=’’ preserveSplits=true) RUNNABLE Wed Oct 21 12:55:47 -0400 2020 Thu Oct 22 22:54:26 -0400 2020
 15358 TruncateTableProcedure (table=’’ preserveSplits=true) RUNNABLE Wed Oct 21 13:00:47 -0400 2020 Wed Oct 21 13:00:47 -0400 2020
 15359 TruncateTableProcedure (table=’’preserveSplits=true) RUNNABLE Wed Oct 21 13:05:47 -0400 2020 Wed Oct 21 13:05:47 -0400 2020

 

 

  • hbase shell - > list command displays the table
  • describe table -> ERROR: Unknown table
  • zkCli -> Does not list the table
  • hbck table_name first displayed the below error:

 

 

ERROR: Table lock acquire attempt found:[tableName=default:’’, lockOwner=’’, threadId=122, purpose=TruncateTableProcedure (table=’’preserveSplits=true) id=15357 owner=’’state=RUNNABLE:TRUNCATE_TABLE_PRE_OPERATION, isShared=false, createTime=1603299347500]
Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  ‘’
1 inconsistencies detected.
Status: INCONSISTENT

 

 

  • ran hbck fixTableLocks table_name - released the lock and no consistencies detected

 

 

Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  ''
0 inconsistencies detected.
Status: OK

 

 

  • Table still appears on the list command and HBase master log still shows retries on the truncate
  • The files exist on hdfs directory as below:

 

$ hdfs dfs -ls hdfs://nameservice1/hbase/.tmp/data/default/*
Found 4 items
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/.tabledesc
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/.tmp
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/b533d42b2b0da96cd7f960619b8ce6f1
drwxr-xr-x   - hbase hbase          0 2020-10-21 12:55 hdfs://nameservice1/hbase/.tmp/data/default/<table_name>/eef3595091cdf51af7488dca37398617

 

 

  • .regioninfo on both the regionservers exist in the hdfs://nameservice1/hbase/.tmp/data/default/table_name/region_name/.regioninfo but is 0 bytes.
  • .tableinfo exists and is correct

 

The table is a staging table and can be dropped safely. What is the best way to recover from such scenario? Any inputs will be helpful.

 

Thank you.

1 REPLY 1

avatar
Super Collaborator

Hello @MG-1 

 

The Symptoms appears to match HBASE-20616.

 

In your Case, You should follow the below Steps:

(I) Confirm the only RUNNABLE Procedures are TruncateTableProcedure

(II) Stop the HMaster Services (Active & Standby). 

(III) Sideline the Contents of MasterProcWALs Directory (/hbase/MasterProcWALs).

(IV) Sideline the Table Region Directories from "/hbase/.tmp/" Directory. 

(V) Start the HMaster Services. 

 

The above Process ensures the Master doesn't attempt the Truncate Table Procedures by sidelining the MasterProcWALs Directory Contents & Sideline the Table Region Directories from "/hbase/.tmp/" Directory, else the Table won't be allowed to be Created again. 

 

- Smarak