Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hbase- Eliminating empty regions with an export-import approach

avatar
Contributor

We are trying to reduce the number of empty regions in a table (informs_search). This table has around 5900 regions (includes thousands of empty regions) and 8TB worth data.

With an export – import approach on a sample data (16,819,569 rows).

Backup informs serach

disable 'informs_search'

snapshot 'informs_search', 'informs_search_snpsht'

clone_snapshot 'informs_search_snpsht', 'informs_search_backup'

delete_snapshot 'informs_search_snpsht'

enable ‘informs_search’

Export informs_search

/usr/hdp/current/hbase-client/bin/hbase org.apache.hadoop.hbase.mapreduce.Export 'informs_search' /db/support/hexport/inform_search_bk 1 1 1443738964000

Truncate informs search

truncate ‘informs_search’

Import informs_search

hbase org.apache.hadoop.hbase.mapreduce.Import 'informs_search' /db/support/hexport/inform_search_bk

Observations:-

  • Before we ran these steps , we had 9 regions (6+3) across two region servers
  • After we ran these steps, we have 2 regions across 1 Region server

----------------------------------------------------------------------------------------------------------------------------------

* In Production, after running the same, would that reduce to 2 regions as well?

* IS there anyway to predict/configure the resultant number of regions and regions servers?

* Also, how many major compactions will it take so that data will be distributed across the region servers (and regions)?

1 ACCEPTED SOLUTION

avatar
New Member

The final #regions depends on the data at hand... If you want to constrain the number of regions, you should create a table with that many (pre-split) regions, and then import into that.

View solution in original post

3 REPLIES 3

avatar
New Member

The final #regions depends on the data at hand... If you want to constrain the number of regions, you should create a table with that many (pre-split) regions, and then import into that.

avatar
@vwunnava@hortonworks.com

Please avoid putting customer names here as this is public facing forum. editing your question for the same.

avatar

Removed customer name.