Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Rising Star

New table region split/merge API

New API in HBase HDP 2.5 allows user to disable/enable automatic region splits and merges. From HBase shell you can run the following commands:

Enable region splits
hbase> splitormerge_switch 'SPLIT', true
Disable region splits
hbase> splitormerge_switch 'SPLIT', false

Enable region merges

hbase> splitormerge_switch 'MERGE', true
Disable region merges
hbase> splitormerge_switch 'MERGE', false
Check region split switch status
hbase> splitormerge_enabled 'SPLIT'

Check region merge switch status

hbase> splitormerge_enabled 'MERGE'

Usage in HBase hbck tool

HBase hbck tool can automatically use this API during restore operation if the following command-line argument is specified: -disableSplitAndMerge or tool is run in repair mode. Disabling region splits and merges during repair or diagnostic runs improves tool's ability to diagnose and repair HBase cluster.

Usage in table snapshots

It is recommended now to disable both: region splits and merges before you run snapshot command. On a large tables with many regions, splits and merges during snapshot operation will result in snapshot failure during snapshot's verification phase, therefore it is recommended to disable them completely and restore their states after snapshot operation:

hbase> splitormerge_switch 'SPLIT', false
hbase> splitormerge_switch 'MERGE', false
hbase> snapshot 'namespace:sourceTable', 'snapshotName'
hbase> splitormerge_switch 'SPLIT', true
hbase> splitormerge_switch 'MERGE', true

Usage during bulk data load

Bulk loads, sometimes, take a lot of time because, loader tool must split HFiles into new region boundaries. Why? Becuase, during operation, some regions can be split or merged and prepared HFiles, which cross these new boundaries must be split. The split operation is performed in a single JVM and may require substantial time. These splits/merges can continue and will require new HFile splits. These chains of events : region split/merge -> HFile splits -> region splits/merge -> ... can be very long. So this why new split/merge API is important during HBase bulk data load. Disable splits/merges before you run bulk load and restore their status after.

2,616 Views