- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 08-25-2016 06:14 AM
New table region split/merge API
New API in HBase HDP 2.5 allows user to disable/enable automatic region splits and merges. From HBase shell you can run the following commands:
Enable region splitshbase> splitormerge_switch 'SPLIT', trueDisable region splits
hbase> splitormerge_switch 'SPLIT', false
Enable region merges
hbase> splitormerge_switch 'MERGE', trueDisable region merges
hbase> splitormerge_switch 'MERGE', falseCheck region split switch status
hbase> splitormerge_enabled 'SPLIT'
Check region merge switch status
hbase> splitormerge_enabled 'MERGE'
Usage in HBase hbck tool
HBase hbck tool can automatically use this API during restore operation if the following command-line argument is specified: -disableSplitAndMerge or tool is run in repair mode. Disabling region splits and merges during repair or diagnostic runs improves tool's ability to diagnose and repair HBase cluster.
Usage in table snapshots
It is recommended now to disable both: region splits and merges before you run snapshot command. On a large tables with many regions, splits and merges during snapshot operation will result in snapshot failure during snapshot's verification phase, therefore it is recommended to disable them completely and restore their states after snapshot operation:
hbase> splitormerge_switch 'SPLIT', false hbase> splitormerge_switch 'MERGE', false hbase> snapshot 'namespace:sourceTable', 'snapshotName' hbase> splitormerge_switch 'SPLIT', true hbase> splitormerge_switch 'MERGE', true
Usage during bulk data load
Bulk loads, sometimes, take a lot of time because, loader tool must split HFiles into new region boundaries. Why? Becuase, during operation, some regions can be split or merged and prepared HFiles, which cross these new boundaries must be split. The split operation is performed in a single JVM and may require substantial time. These splits/merges can continue and will require new HFile splits. These chains of events : region split/merge -> HFile splits -> region splits/merge -> ... can be very long. So this why new split/merge API is important during HBase bulk data load. Disable splits/merges before you run bulk load and restore their status after.