Created 10-02-2015 08:35 PM
We are trying to reduce the number of empty regions in a table (informs_search). This table has around 5900 regions (includes thousands of empty regions) and 8TB worth data.
With an export – import approach on a sample data (16,819,569 rows).
Backup informs serach
disable 'informs_search'
snapshot 'informs_search', 'informs_search_snpsht'
clone_snapshot 'informs_search_snpsht', 'informs_search_backup'
delete_snapshot 'informs_search_snpsht'
enable ‘informs_search’
Export informs_search
/usr/hdp/current/hbase-client/bin/hbase org.apache.hadoop.hbase.mapreduce.Export 'informs_search' /db/support/hexport/inform_search_bk 1 1 1443738964000
Truncate informs search
truncate ‘informs_search’
Import informs_search
hbase org.apache.hadoop.hbase.mapreduce.Import 'informs_search' /db/support/hexport/inform_search_bk
Observations:-
----------------------------------------------------------------------------------------------------------------------------------
* In Production, after running the same, would that reduce to 2 regions as well?
* IS there anyway to predict/configure the resultant number of regions and regions servers?
* Also, how many major compactions will it take so that data will be distributed across the region servers (and regions)?
Created 10-02-2015 10:26 PM
The final #regions depends on the data at hand... If you want to constrain the number of regions, you should create a table with that many (pre-split) regions, and then import into that.
Created 10-02-2015 10:26 PM
The final #regions depends on the data at hand... If you want to constrain the number of regions, you should create a table with that many (pre-split) regions, and then import into that.
Created 10-04-2015 11:34 AM
Please avoid putting customer names here as this is public facing forum. editing your question for the same.
Created 10-04-2015 11:35 AM
Removed customer name.