Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDH 6.3.2 - Hbase2 Table problem

avatar
Explorer

I have inherited a problem.

 

48 regions in hbase:meta in transition.

 

Table has had data removed - most likely manually in an attempt to fix RIT issues. These RITs are probably a result of a network outage mid-operation.

 

Table is currently ENABLED and cannot be DISABLED (this has already been attempted by previous techie, which resulted in LOCKS/Procedures for DISABLE and DELETE as well as RITs).

 

Table is no longer required so can be deleted.  HDFS reported it as being only 6k so I removed the table directories and zapped the znodes via ZK shell.  This fixed the locks/procedures messages but CManager still reports 48 regions in transition and, as a result of this, balancing is not working.

 

What I need is a way to remove the rows from 'hbase:meta' as this is the only place where this table is still referenced.

 

Sample output:

 

alfa:rfilenameext column=table:state, timestamp=1604493139455, value=\x08\x00
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:regioninfo, timestamp=1604388258225, value={ENCODED => 35925292c25898671e5a894ce387e167, NAME => 'alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167.', STARTKEY => '', ENDKEY => '0'}
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:seqnumDuringOpen, timestamp=1601269814633, value=\x00\x00\x00\x00\x00\x00\x008
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:server, timestamp=1601269814633, value=ba-wtmp04.asgardalfa.hq.com:16020
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:serverstartcode, timestamp=1601269814633, value=1601061167123
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:sn, timestamp=1604388258050, value=ba-wtmp04.asgardalfa.hq.com,16020,1601061167123
alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:state, timestamp=1604388258225, value=CLOSED
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:regioninfo, timestamp=1600969938610, value={ENCODED => 787d1455b84f2d846ce6089392f01fd2, NAME => 'alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2.', STARTKEY => '0', ENDKEY => '1'}
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:seqnumDuringOpen, timestamp=1600780187722, value=\x00\x00\x00\x00\x00\x02^\xBB
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:server, timestamp=1600780187722, value=ba-wtmp08.asgardalfa.hq.com:16020
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:serverstartcode, timestamp=1600780187722, value=1600780162556
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:sn, timestamp=1600969938610, value=ba-wtmp07.asgardalfa.hq.com,16020,1600936054386
alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:state, timestamp=1600969938610, value=OPENING
alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:regioninfo, timestamp=1601060563980, value={ENCODED => aa9d89b40a9def31a080fdd1776acb4e, NAME => 'alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e.', STARTKEY => '1', ENDKEY => '2'}
alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:seqnumDuringOpen, timestamp=1600780186976, value=\x00\x00\x00\x00\x00\x02^\xA9
alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:server, timestamp=1600780186976, value=dr1-wtmp02.asgardalfa.hq.com:16020
alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:serverstartcode, timestamp=1600780186976, value=1600780163021
alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:sn, timestamp=1601060563980, value=ba-wtmp05.asgardalfa.hq.com,16020,1601049145467

 

I have been scanning various sources but these have not been very clear or relevant.  For this problem I just want to remove all references (rows) which are related to alfa:rfilenameext table from the hbase:meta table.  Whichever way this happens is of no importance.

 

However, there are other tables in existence on this cluster which are needed so I am not sure about a rebuild of the entire meta table.

 

Apologies in advance...I am a complete hbase newbie and was hoping there was a command such as:

 

delete 'alfa:rfilenameext' from 'hbase:meta'

 

which might serve to remove all rows for that table.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @TGH 

 

Thanks for the response. To your queries,

 

(I) HBCK2 has extraRegionsInMeta for removing Regions from HBase:Meta, which doesn't have any HDFS Directories. Running the HBCK2 Tool with the concerned command shows the Regions in Meta, which aren't present in HDFS & adding a Fix flag (-f) remove them as well. 

(II) Using Delete Command on HBase:Meta isn't an issue, yet we generally avoid making any changes to the HBase:Meta manually. It's more of a recommendation to avoid any manual oversight causing HBase:Meta corruption.

(III) We can change the Region State via HBCK2 setRegionState Command. Note that the HBCK2 Git Page recommend using the Command as a last resort, considering the risky nature. If Customer is aware of the risk associated with the concerned Command, they can run the Command to set the TableState or RegionState.

 

- Smarak

View solution in original post

12 REPLIES 12

avatar
Super Collaborator

Hello @TGH 

 

Thanks for using Cloudera Community. You had Region-In-Transition (RIT) & the HDFS Directory has been removed for the Regions along with the ZNode being removed, yet HBase reports RIT. You wish to fix the RIT issue by removing the Meta Table entries as RIT avoids Balancer run. In HBase v2 (CDH v6.3.x), the MasterProcWALs is critical for any Procedure, which are stuck or blocked. You mentioned a lot of procedures (Disable|Delete) being observed. 

 

The graceful manner for your Team to manage the requirement is to use the HBCK2 Tool. You can build the HBCK2 using the Link [1]. Next, You can use the HBCK2 Tool to bypass the Procedure (PIDs) associated with the Table, for which the Region Directories have been removed. Once any PID is bypassed, the HMaster UI Page (Locks & Procedures ) Section would show the PID as "Bypass". After ensuring the required PIDs are bypassed, Restart the HMaster Service & use the HBCK2 Tool to remove the Region entries in Meta, for which the HDFS Region Directories are removed. Use "bypass" & "extraRegionsInMeta" HBCK2 Command as documented in Link [1].

 

Alternatively, You can Stop HMaster > Remove MasterProcWALs (After confirming no RUNNABLE Procedures excluding the PID associated with the Table for which Region Directory have been removed) > Start HMaster. However, this isn't an ideal approach & you can encounter "Master Is Initialising" issue, for which HBCK2 Tool is required. The "Master Is Initialising" context is captured in Link [1] as well.

 

- Smarak

 

[1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

avatar
Explorer

Hi, @smdas,

 

In my post I neglected to mention that (last week) I shut hbase, removed the masterprocwals, dfs removed the structure, zk'd the znodes, and restarted hbase.  This removed all references to the affected table.  There are (currently) no locks or procedures for this table.

 

The hbase UI reports those regions in transition - I assume only because they are contained in hbase:meta - but there is nothing in storage, nothing in zk, and CM shows no table details as that table is not there.

 

I have been poring over info regarding use of hbck2 and cannot seem to find anything resembling a suitable operation that will actually operate on hbase:meta rows, just general options for fixing meta as a kind of global operation.

 

 

avatar
Super Collaborator

Hello @TGH 

 

Sharing the Steps for building the HBCK2 Jar using Git reference & additionally, refer the Post via [1] for the details on building HBCK2 Tool as well.

 

Screenshot 2020-12-08 at 7.40.24 PM.png

 

- Smarak

 

[1] https://community.cloudera.com/t5/Support-Questions/How-to-get-hbck2-tool-for-CDH-6-3-2/m-p/295867/h...

avatar
Explorer

Getting it and/or building it is not the problem. Does it have the facility to remove redundant meta information, and without corrupting other meta data for needed tables?

 

Why is 'delete' or 'deleteall' unsuitable, is there no way to specify removal of rows containing that table name?  After all, if it is just a table like every other table then there must exist a normal database method to selectively remove data.

 

Is it possible to eradicate transition errors (in this case) by manually altering the state to OPENING -> OPEN -> CLOSING -> CLOSED... would this 'finalise' transition errors to the point they get removed from meta by regular hbase operation?

avatar
Super Collaborator

Hello @TGH 

 

Thanks for the response. To your queries,

 

(I) HBCK2 has extraRegionsInMeta for removing Regions from HBase:Meta, which doesn't have any HDFS Directories. Running the HBCK2 Tool with the concerned command shows the Regions in Meta, which aren't present in HDFS & adding a Fix flag (-f) remove them as well. 

(II) Using Delete Command on HBase:Meta isn't an issue, yet we generally avoid making any changes to the HBase:Meta manually. It's more of a recommendation to avoid any manual oversight causing HBase:Meta corruption.

(III) We can change the Region State via HBCK2 setRegionState Command. Note that the HBCK2 Git Page recommend using the Command as a last resort, considering the risky nature. If Customer is aware of the risk associated with the concerned Command, they can run the Command to set the TableState or RegionState.

 

- Smarak

avatar
Explorer

I downloaded and extracted hbase-operator-tools.1.0.0 via link.

 

Downloads\hbase-operator-tools-1.0.0-bin.tar\hbase-operator-tools-1.0.0\hbase-hbck2

 

The readme has no mention of 'extraregionsinmeta' so I am a bit confused as to where that comes from.

 

By the way, not sure if this is a factor but we have active and standby masters.

 

HBase Version2.1.0-cdh6.3.2

avatar
Super Collaborator

Hello @TGH 

 

I downloaded the HBCK2 Tool from the Steps shared & I could see the "extraRegionsInMeta" listed in the "README.md" file.

 

- Smarak

avatar
Explorer

Sorry, I see that the version of the README that is on this page

https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

has extraRegionsinMeta

extraRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
   Options:
    -f, --fix    fix meta by removing all extra regions found.
   Reports regions present on hbase:meta, but with no related
   directories on the file system. Needs hbase:meta to be online.
   For each table name passed as parameter, performs diff
   between regions available in hbase:meta and region dirs on the given
   file system. Extra regions would get deleted from Meta
   if passed the --fix option.
   NOTE: Before deciding on use the "--fix" option, it's worth check if
   reported extra regions are overlapping with existing valid regions.
   If so, then "extraRegionsInMeta --fix" is indeed the optimal solution.
   Otherwise, "assigns" command is the simpler solution, as it recreates
   regions dirs in the filesystem, if not existing.
   An example triggering extra regions report for tables 'table_1'
   and 'table_2', under default namespace:
     $ HBCK2 extraRegionsInMeta default:table_1 default:table_2
   An example triggering extra regions report for table 'table_1'
   under default namespace, and for all tables from namespace 'ns1':
     $ HBCK2 extraRegionsInMeta default:table_1 ns1
   Returns list of extra regions for each table passed as parameter, or
   for each table on namespaces specified as parameter.

 but the README that is contained in the .tgz file does NOT have this information and it is misleading to have it included in one and not within the actual src/bin distribution.

Command:
addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
Options:
-d,--force_disable aborts fix for table if disable fails.
To be used when some regions may be missing from hbase:meta
but their directories are present in HDFS. This is a 'lighter'
version of 'OfflineMetaRepair' tool commonly used for similar
issues in hbase-1.x. This command needs hbase:meta to be online.
For each table name passed as parameter, it performs a diff
between regions available in hbase:meta and region dirs on HDFS.
Then for dirs with no hbase:meta matches, it reads the 'regioninfo'
metadata file and re-creates given region in hbase:meta. Regions are
re-created in 'CLOSED' state in the hbase:meta table, but not in the
Masters' cache, and they are not assigned either. To get these
regions online, run the HBCK2 'assigns'command printed when this
command-run completes.
NOTE: If using hbase releases older than 2.3.0, a rolling restart of
HMasters is needed prior to executing the provided 'assigns' command.
An example adding missing regions for tables 'tbl_1' in the default
namespace, 'tbl_2' in namespace 'n1' and for all tables from
namespace 'n2':
$ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
Returns HBCK2 an 'assigns' command with all re-inserted regions.
SEE ALSO: reportMissingRegionsInMeta

assigns [OPTIONS] <ENCODED_REGIONNAME>...
Options:
-o,--override override ownership by another procedure
A 'raw' assign that can be used even during Master initialization (if
the -skip flag is specified). Skirts Coprocessors. Pass one or more
encoded region names. 1588230740 is the hard-coded name for the
hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of
what a user-space encoded region name looks like. For example:
$ HBCK2 assign 1588230740 de00010733901a05f5a2a3a382e27dd4
Returns the pid(s) of the created AssignProcedure(s) or -1 if none.

bypass [OPTIONS] <PID>...
Options:
-o,--override override if procedure is running/stuck
-r,--recursive bypass parent and its children. SLOW! EXPENSIVE!
-w,--lockWait milliseconds to wait before giving up; default=1
Pass one (or more) procedure 'pid's to skip to procedure finish. Parent
of bypassed procedure will also be skipped to the finish. Entities will
be left in an inconsistent state and will require manual fixup. May
need Master restart to clear locks still held. Bypass fails if
procedure has children. Add 'recursive' if all you have is a parent pid
to finish parent and children. This is SLOW, and dangerous so use
selectively. Does not always work.

filesystem [OPTIONS] [<TABLENAME>...]
Options:
-f, --fix sideline corrupt hfiles, bad links, and references.
Report on corrupt hfiles, references, broken links, and integrity.
Pass '--fix' to sideline corrupt files and links. '--fix' does NOT
fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or
more tablenames to narrow checkup. Default checks all tables and
restores 'hbase.version' if missing. Interacts with the filesystem
only! Modified regions need to be reopened to pick-up changes.

fixMeta

 

Regardless, if it does exist then it should do the job I want.  I shall reveal the outcome after execution.

 

 

avatar
Explorer

I got this when I ran it:-

ERROR: Unsupported command: extraRegionsInMeta

 

It seems our version of HBASE may not support this command.

 

So I am left with the only option of trying to manually remove these rows from meta table...and I am a complete newbie in this regard.