Created 12-08-2020 05:23 AM
I have inherited a problem.
48 regions in hbase:meta in transition.
Table has had data removed - most likely manually in an attempt to fix RIT issues. These RITs are probably a result of a network outage mid-operation.
Table is currently ENABLED and cannot be DISABLED (this has already been attempted by previous techie, which resulted in LOCKS/Procedures for DISABLE and DELETE as well as RITs).
Table is no longer required so can be deleted. HDFS reported it as being only 6k so I removed the table directories and zapped the znodes via ZK shell. This fixed the locks/procedures messages but CManager still reports 48 regions in transition and, as a result of this, balancing is not working.
What I need is a way to remove the rows from 'hbase:meta' as this is the only place where this table is still referenced.
Sample output:
alfa:rfilenameext column=table:state, timestamp=1604493139455, value=\x08\x00 alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:regioninfo, timestamp=1604388258225, value={ENCODED => 35925292c25898671e5a894ce387e167, NAME => 'alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167.', STARTKEY => '', ENDKEY => '0'} alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:seqnumDuringOpen, timestamp=1601269814633, value=\x00\x00\x00\x00\x00\x00\x008 alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:server, timestamp=1601269814633, value=ba-wtmp04.asgardalfa.hq.com:16020 alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:serverstartcode, timestamp=1601269814633, value=1601061167123 alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:sn, timestamp=1604388258050, value=ba-wtmp04.asgardalfa.hq.com,16020,1601061167123 alfa:rfilenameext,,1557760164826.35925292c25898671e5a894ce387e167. column=info:state, timestamp=1604388258225, value=CLOSED alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:regioninfo, timestamp=1600969938610, value={ENCODED => 787d1455b84f2d846ce6089392f01fd2, NAME => 'alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2.', STARTKEY => '0', ENDKEY => '1'} alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:seqnumDuringOpen, timestamp=1600780187722, value=\x00\x00\x00\x00\x00\x02^\xBB alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:server, timestamp=1600780187722, value=ba-wtmp08.asgardalfa.hq.com:16020 alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:serverstartcode, timestamp=1600780187722, value=1600780162556 alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:sn, timestamp=1600969938610, value=ba-wtmp07.asgardalfa.hq.com,16020,1600936054386 alfa:rfilenameext,0,1557760164826.787d1455b84f2d846ce6089392f01fd2. column=info:state, timestamp=1600969938610, value=OPENING alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:regioninfo, timestamp=1601060563980, value={ENCODED => aa9d89b40a9def31a080fdd1776acb4e, NAME => 'alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e.', STARTKEY => '1', ENDKEY => '2'} alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:seqnumDuringOpen, timestamp=1600780186976, value=\x00\x00\x00\x00\x00\x02^\xA9 alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:server, timestamp=1600780186976, value=dr1-wtmp02.asgardalfa.hq.com:16020 alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:serverstartcode, timestamp=1600780186976, value=1600780163021 alfa:rfilenameext,1,1557760164826.aa9d89b40a9def31a080fdd1776acb4e. column=info:sn, timestamp=1601060563980, value=ba-wtmp05.asgardalfa.hq.com,16020,1601049145467
I have been scanning various sources but these have not been very clear or relevant. For this problem I just want to remove all references (rows) which are related to alfa:rfilenameext table from the hbase:meta table. Whichever way this happens is of no importance.
However, there are other tables in existence on this cluster which are needed so I am not sure about a rebuild of the entire meta table.
Apologies in advance...I am a complete hbase newbie and was hoping there was a command such as:
delete 'alfa:rfilenameext' from 'hbase:meta'
which might serve to remove all rows for that table.
Created 12-09-2020 05:06 AM
Hello @TGH
Thanks for the response. To your queries,
(I) HBCK2 has extraRegionsInMeta for removing Regions from HBase:Meta, which doesn't have any HDFS Directories. Running the HBCK2 Tool with the concerned command shows the Regions in Meta, which aren't present in HDFS & adding a Fix flag (-f) remove them as well.
(II) Using Delete Command on HBase:Meta isn't an issue, yet we generally avoid making any changes to the HBase:Meta manually. It's more of a recommendation to avoid any manual oversight causing HBase:Meta corruption.
(III) We can change the Region State via HBCK2 setRegionState Command. Note that the HBCK2 Git Page recommend using the Command as a last resort, considering the risky nature. If Customer is aware of the risk associated with the concerned Command, they can run the Command to set the TableState or RegionState.
- Smarak
Created 12-08-2020 06:05 AM
Hello @TGH
Thanks for using Cloudera Community. You had Region-In-Transition (RIT) & the HDFS Directory has been removed for the Regions along with the ZNode being removed, yet HBase reports RIT. You wish to fix the RIT issue by removing the Meta Table entries as RIT avoids Balancer run. In HBase v2 (CDH v6.3.x), the MasterProcWALs is critical for any Procedure, which are stuck or blocked. You mentioned a lot of procedures (Disable|Delete) being observed.
The graceful manner for your Team to manage the requirement is to use the HBCK2 Tool. You can build the HBCK2 using the Link [1]. Next, You can use the HBCK2 Tool to bypass the Procedure (PIDs) associated with the Table, for which the Region Directories have been removed. Once any PID is bypassed, the HMaster UI Page (Locks & Procedures ) Section would show the PID as "Bypass". After ensuring the required PIDs are bypassed, Restart the HMaster Service & use the HBCK2 Tool to remove the Region entries in Meta, for which the HDFS Region Directories are removed. Use "bypass" & "extraRegionsInMeta" HBCK2 Command as documented in Link [1].
Alternatively, You can Stop HMaster > Remove MasterProcWALs (After confirming no RUNNABLE Procedures excluding the PID associated with the Table for which Region Directory have been removed) > Start HMaster. However, this isn't an ideal approach & you can encounter "Master Is Initialising" issue, for which HBCK2 Tool is required. The "Master Is Initialising" context is captured in Link [1] as well.
- Smarak
[1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
Created 12-08-2020 11:46 PM
Hi, @smdas,
In my post I neglected to mention that (last week) I shut hbase, removed the masterprocwals, dfs removed the structure, zk'd the znodes, and restarted hbase. This removed all references to the affected table. There are (currently) no locks or procedures for this table.
The hbase UI reports those regions in transition - I assume only because they are contained in hbase:meta - but there is nothing in storage, nothing in zk, and CM shows no table details as that table is not there.
I have been poring over info regarding use of hbck2 and cannot seem to find anything resembling a suitable operation that will actually operate on hbase:meta rows, just general options for fixing meta as a kind of global operation.
Created 12-08-2020 06:20 AM
Hello @TGH
Sharing the Steps for building the HBCK2 Jar using Git reference & additionally, refer the Post via [1] for the details on building HBCK2 Tool as well.
- Smarak
Created on 12-09-2020 03:08 AM - edited 12-09-2020 03:17 AM
Getting it and/or building it is not the problem. Does it have the facility to remove redundant meta information, and without corrupting other meta data for needed tables?
Why is 'delete' or 'deleteall' unsuitable, is there no way to specify removal of rows containing that table name? After all, if it is just a table like every other table then there must exist a normal database method to selectively remove data.
Is it possible to eradicate transition errors (in this case) by manually altering the state to OPENING -> OPEN -> CLOSING -> CLOSED... would this 'finalise' transition errors to the point they get removed from meta by regular hbase operation?
Created 12-09-2020 05:06 AM
Hello @TGH
Thanks for the response. To your queries,
(I) HBCK2 has extraRegionsInMeta for removing Regions from HBase:Meta, which doesn't have any HDFS Directories. Running the HBCK2 Tool with the concerned command shows the Regions in Meta, which aren't present in HDFS & adding a Fix flag (-f) remove them as well.
(II) Using Delete Command on HBase:Meta isn't an issue, yet we generally avoid making any changes to the HBase:Meta manually. It's more of a recommendation to avoid any manual oversight causing HBase:Meta corruption.
(III) We can change the Region State via HBCK2 setRegionState Command. Note that the HBCK2 Git Page recommend using the Command as a last resort, considering the risky nature. If Customer is aware of the risk associated with the concerned Command, they can run the Command to set the TableState or RegionState.
- Smarak
Created 12-09-2020 06:34 AM
I downloaded and extracted hbase-operator-tools.1.0.0 via link.
Downloads\hbase-operator-tools-1.0.0-bin.tar\hbase-operator-tools-1.0.0\hbase-hbck2
The readme has no mention of 'extraregionsinmeta' so I am a bit confused as to where that comes from.
By the way, not sure if this is a factor but we have active and standby masters.
HBase Version | 2.1.0-cdh6.3.2 |
Created 12-11-2020 07:24 AM
Hello @TGH
I downloaded the HBCK2 Tool from the Steps shared & I could see the "extraRegionsInMeta" listed in the "README.md" file.
- Smarak
Created 12-14-2020 12:18 AM
Sorry, I see that the version of the README that is on this page
https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
has extraRegionsinMeta
extraRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>... Options: -f, --fix fix meta by removing all extra regions found. Reports regions present on hbase:meta, but with no related directories on the file system. Needs hbase:meta to be online. For each table name passed as parameter, performs diff between regions available in hbase:meta and region dirs on the given file system. Extra regions would get deleted from Meta if passed the --fix option. NOTE: Before deciding on use the "--fix" option, it's worth check if reported extra regions are overlapping with existing valid regions. If so, then "extraRegionsInMeta --fix" is indeed the optimal solution. Otherwise, "assigns" command is the simpler solution, as it recreates regions dirs in the filesystem, if not existing. An example triggering extra regions report for tables 'table_1' and 'table_2', under default namespace: $ HBCK2 extraRegionsInMeta default:table_1 default:table_2 An example triggering extra regions report for table 'table_1' under default namespace, and for all tables from namespace 'ns1': $ HBCK2 extraRegionsInMeta default:table_1 ns1 Returns list of extra regions for each table passed as parameter, or for each table on namespaces specified as parameter.
but the README that is contained in the .tgz file does NOT have this information and it is misleading to have it included in one and not within the actual src/bin distribution.
Command:
addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
Options:
-d,--force_disable aborts fix for table if disable fails.
To be used when some regions may be missing from hbase:meta
but their directories are present in HDFS. This is a 'lighter'
version of 'OfflineMetaRepair' tool commonly used for similar
issues in hbase-1.x. This command needs hbase:meta to be online.
For each table name passed as parameter, it performs a diff
between regions available in hbase:meta and region dirs on HDFS.
Then for dirs with no hbase:meta matches, it reads the 'regioninfo'
metadata file and re-creates given region in hbase:meta. Regions are
re-created in 'CLOSED' state in the hbase:meta table, but not in the
Masters' cache, and they are not assigned either. To get these
regions online, run the HBCK2 'assigns'command printed when this
command-run completes.
NOTE: If using hbase releases older than 2.3.0, a rolling restart of
HMasters is needed prior to executing the provided 'assigns' command.
An example adding missing regions for tables 'tbl_1' in the default
namespace, 'tbl_2' in namespace 'n1' and for all tables from
namespace 'n2':
$ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
Returns HBCK2 an 'assigns' command with all re-inserted regions.
SEE ALSO: reportMissingRegionsInMeta
assigns [OPTIONS] <ENCODED_REGIONNAME>...
Options:
-o,--override override ownership by another procedure
A 'raw' assign that can be used even during Master initialization (if
the -skip flag is specified). Skirts Coprocessors. Pass one or more
encoded region names. 1588230740 is the hard-coded name for the
hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of
what a user-space encoded region name looks like. For example:
$ HBCK2 assign 1588230740 de00010733901a05f5a2a3a382e27dd4
Returns the pid(s) of the created AssignProcedure(s) or -1 if none.
bypass [OPTIONS] <PID>...
Options:
-o,--override override if procedure is running/stuck
-r,--recursive bypass parent and its children. SLOW! EXPENSIVE!
-w,--lockWait milliseconds to wait before giving up; default=1
Pass one (or more) procedure 'pid's to skip to procedure finish. Parent
of bypassed procedure will also be skipped to the finish. Entities will
be left in an inconsistent state and will require manual fixup. May
need Master restart to clear locks still held. Bypass fails if
procedure has children. Add 'recursive' if all you have is a parent pid
to finish parent and children. This is SLOW, and dangerous so use
selectively. Does not always work.
filesystem [OPTIONS] [<TABLENAME>...]
Options:
-f, --fix sideline corrupt hfiles, bad links, and references.
Report on corrupt hfiles, references, broken links, and integrity.
Pass '--fix' to sideline corrupt files and links. '--fix' does NOT
fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or
more tablenames to narrow checkup. Default checks all tables and
restores 'hbase.version' if missing. Interacts with the filesystem
only! Modified regions need to be reopened to pick-up changes.
fixMeta
Regardless, if it does exist then it should do the job I want. I shall reveal the outcome after execution.
Created on 12-15-2020 03:33 AM - edited 12-15-2020 04:09 AM
I got this when I ran it:-
ERROR: Unsupported command: extraRegionsInMeta
It seems our version of HBASE may not support this command.
So I am left with the only option of trying to manually remove these rows from meta table...and I am a complete newbie in this regard.