Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to manually manage number of HBase regions?

avatar
Rising Star

http://hbase.apache.org/0.94/book/important_configurations.html suggests manually managing HBase region splits.

Do others in the community do this? If so:

  • Do you have a use case or example regarding the steps required (including setting hbase.hregion.max.filesize) and how/what you use to implement them? and
  • Have you found it worthwhile in terms of effort vs benefits?

Thanks

1 ACCEPTED SOLUTION

avatar

@Emily Sharpe,

Managed Splitting

Usually HBase handles the splitting of regions automatically: once the regions reach the configured maximum size, they are split into two halves, which then can start taking on more data and grow from there. This is the default behavior and is sufficient for the majority of use cases.

There is one known problematic scenario, though, that can cause what is called split/ compaction storms: when you grow your regions roughly at the same rate, eventually they all need to be split at about the same time, causing a large spike in disk I/O because of the required compactions to rewrite the split regions.

Rather than relying on HBase to handle the splitting, you can turn it off and manually invoke the split and major_compact commands. This is accomplished by setting the hbase.hregion.max.filesize for the entire cluster, or when defining your table schema at the column family level, to a very high number. Setting it to Long.MAX_VALUE is not recommended in case the manual splits fail to run. It is better to set this value to a reasonable upper boundary, such as 100 GB (which would result in a one-hour major compaction if triggered).

The advantage of running the commands to split and compact your regions manually is that you can time-control them. Running them staggered across all regions spreads the I/O load as much as possible, avoiding any split/compaction storm. You will need to implement a client that uses the administrative API to call the split() and majorCom pact() methods. Alternatively, you can use the shell to invoke the commands interactively, or script their call using cron, for instance. Also see the RegionSplitter (added in version 0.90.2), discussed shortly, for another way to split existing regions: it has a rolling split feature you can use to carefully split the existing regions while waiting long enough for the involved compactions to complete (see the -r and -o command-line options).

An additional advantage to managing the splits manually is that you have better control over which regions are available at any time. This is good in the rare case that you have to do very low-level debugging, to, for example, see why a certain region had problems. With automated splits it might happen that by the time you want to check into a specific region, it has already been replaced with two daughter regions. These regions have new names and tracing the evolution of the original region over longer periods of time makes it much more difficult to find the information you require.

Region Hotspotting

Using the metrics you can determine if you are dealing with a write pattern that is causing a specific region to run hot.

If this is the case, you may need to salt the keys, or use random keys to distribute the load across all servers evenly.

The only way to alleviate the situation is to manually split a hot region into one or more new regions, at exact boundaries. This will divide the region’s load over multiple region servers. As you split a region you can specify a split key, that is, the row key where you can split the given region into two. You can specify any row key within that region so that you are also able to generate halves that are completely different in size.

This might help only when you are not dealing with completely sequential key ranges, because those are always going to hit one region for a considerable amount of time.

Table Hotspotting

Sometimes an existing table with many regions is not distributed well—in other words, most of its regions are located on the same region server.# This means that, although you insert data with random keys, you still load one region server much more often than the others. You can use the move() function, from the HBase Shell, or use the HBaseAdmin class to explicitly move the server’s table regions to other servers. Alternatively, you can use the unassign() method or shell command to simply remove a region of the affected table from the current server. The master will immediately deploy it on another available server.

Presplitting Regions

Managing the splits is useful to tightly control when load is going to increase on your cluster. You still face the problem that when initially loading a table, you need to split the regions rather often, since you usually start out with a single region per table. Growing this single region to a very large size is not recommended; therefore, it is better to start with a larger number of regions right from the start. This is done by presplitting the regions of an existing table, or by creating a table with the required number of regions.

The createTable() method of the administrative API, as well as the shell’s create command, both take a list of split keys, which can be used to presplit a table when it is created. HBase also ships with a utility called RegionSplitter, which you can use to create a presplit table. Starting it without a parameter will show usage information:

$ ./bin/hbase org.apache.hadoop.hbase.util.RegionSplitter

usage: RegionSplitter

-c Create a new table with a pre-split number of regions

-D Override HBase Configuration Settings

-f Column Families to create with new table. Required with -c

-h Print this usage help

-o Max outstanding splits that have unfinished major compactions

-r Perform a rolling split of an existing region

--risky Skip verification steps to complete quickly.STRONGLY DISCOURAGED for production systems.

By default, it used the MD5StringSplit class to partition the row keys into ranges. You can define your own algorithm by implementing the SplitAlgorithm interface provided, and handing it into the utility using the -D split.algorithm= parameter. An example of using the supplied split algorithm class and creating a presplit table is:

$ ./bin/hbase org.apache.hadoop.hbase.util.RegionSplitter \ -c 10 testtable -f colfam1

In the web UI of the master, you can click on the link with the newly created table name to see the generated regions:

testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093. testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a. testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92. testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258. testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9. testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5. testtable,4cccccc8,1309766006467.f0c5045c304c2ff5338be27e81ae698e. testtable,59999994,1309766006467.2d854f337aa6c09232409f0ba1d4964b. testtable,66666660,1309766006467.b1ec9df9fd90d91f54cb18da5edc2581. testtable,7333332c,1309766006468.42e179b78663b64401079a8601d9bd06.

Or you can use the shell’s create command:

hbase(main):001:0> create 'testtable', 'colfam1', \ { SPLITS => ['row-100', 'row-200', 'row-300', 'row-400'] }

0 row(s) in 1.1670 seconds

This generates the following regions:

testtable,,1309768272330.37377c4ab0a944a326ba8b6596a29396.

testtable,row-100,1309768272331.e6092cc777f58a08c61bf081aba14916.

testtable,row-200,1309768272331.63c9630a79b37ebce7b58cde0235dfe5.

testtable,row-300,1309768272331.eead6ad2ff3303ffe6a3126e0df3ff7a.

testtable,row-400,1309768272331.2bee7417fa67e4ac8c7210ce7325708e.

As for the number of presplit regions to use, you can start low with 10 presplit regions per server and watch as data grows over time. It is better to err on the side of too few regions and using a rolling split later, as having too many regions is usually not ideal in regard to overall cluster performance.

Alternatively, you can determine how many presplit regions to use based on the largest store file in your region: with a growing data size, this will get larger over time, and you want the largest region to be just big enough so that is not selected for major compaction—or you might face the mentioned compaction storms.

If you presplit your regions too thin, you can increase the major compaction interval by increasing the value for the hbase.hregion.majorcompaction configuration property. If your data size grows too large, use the RegionSplitter utility to perform a network I/O safe rolling split of all regions.

Use of manual splits and presplit regions is an advanced concept that requires a lot of planning and careful monitoring. On the other hand, it can help you to avoid the compaction storms that can happen for uniform data growth, or to shed load of hot regions by splitting them manually.

Hope this helps you understand "how to manually manage hbase regions splits?"

View solution in original post

6 REPLIES 6

avatar

@Emily Sharpe,

Managed Splitting

Usually HBase handles the splitting of regions automatically: once the regions reach the configured maximum size, they are split into two halves, which then can start taking on more data and grow from there. This is the default behavior and is sufficient for the majority of use cases.

There is one known problematic scenario, though, that can cause what is called split/ compaction storms: when you grow your regions roughly at the same rate, eventually they all need to be split at about the same time, causing a large spike in disk I/O because of the required compactions to rewrite the split regions.

Rather than relying on HBase to handle the splitting, you can turn it off and manually invoke the split and major_compact commands. This is accomplished by setting the hbase.hregion.max.filesize for the entire cluster, or when defining your table schema at the column family level, to a very high number. Setting it to Long.MAX_VALUE is not recommended in case the manual splits fail to run. It is better to set this value to a reasonable upper boundary, such as 100 GB (which would result in a one-hour major compaction if triggered).

The advantage of running the commands to split and compact your regions manually is that you can time-control them. Running them staggered across all regions spreads the I/O load as much as possible, avoiding any split/compaction storm. You will need to implement a client that uses the administrative API to call the split() and majorCom pact() methods. Alternatively, you can use the shell to invoke the commands interactively, or script their call using cron, for instance. Also see the RegionSplitter (added in version 0.90.2), discussed shortly, for another way to split existing regions: it has a rolling split feature you can use to carefully split the existing regions while waiting long enough for the involved compactions to complete (see the -r and -o command-line options).

An additional advantage to managing the splits manually is that you have better control over which regions are available at any time. This is good in the rare case that you have to do very low-level debugging, to, for example, see why a certain region had problems. With automated splits it might happen that by the time you want to check into a specific region, it has already been replaced with two daughter regions. These regions have new names and tracing the evolution of the original region over longer periods of time makes it much more difficult to find the information you require.

Region Hotspotting

Using the metrics you can determine if you are dealing with a write pattern that is causing a specific region to run hot.

If this is the case, you may need to salt the keys, or use random keys to distribute the load across all servers evenly.

The only way to alleviate the situation is to manually split a hot region into one or more new regions, at exact boundaries. This will divide the region’s load over multiple region servers. As you split a region you can specify a split key, that is, the row key where you can split the given region into two. You can specify any row key within that region so that you are also able to generate halves that are completely different in size.

This might help only when you are not dealing with completely sequential key ranges, because those are always going to hit one region for a considerable amount of time.

Table Hotspotting

Sometimes an existing table with many regions is not distributed well—in other words, most of its regions are located on the same region server.# This means that, although you insert data with random keys, you still load one region server much more often than the others. You can use the move() function, from the HBase Shell, or use the HBaseAdmin class to explicitly move the server’s table regions to other servers. Alternatively, you can use the unassign() method or shell command to simply remove a region of the affected table from the current server. The master will immediately deploy it on another available server.

Presplitting Regions

Managing the splits is useful to tightly control when load is going to increase on your cluster. You still face the problem that when initially loading a table, you need to split the regions rather often, since you usually start out with a single region per table. Growing this single region to a very large size is not recommended; therefore, it is better to start with a larger number of regions right from the start. This is done by presplitting the regions of an existing table, or by creating a table with the required number of regions.

The createTable() method of the administrative API, as well as the shell’s create command, both take a list of split keys, which can be used to presplit a table when it is created. HBase also ships with a utility called RegionSplitter, which you can use to create a presplit table. Starting it without a parameter will show usage information:

$ ./bin/hbase org.apache.hadoop.hbase.util.RegionSplitter

usage: RegionSplitter

-c Create a new table with a pre-split number of regions

-D Override HBase Configuration Settings

-f Column Families to create with new table. Required with -c

-h Print this usage help

-o Max outstanding splits that have unfinished major compactions

-r Perform a rolling split of an existing region

--risky Skip verification steps to complete quickly.STRONGLY DISCOURAGED for production systems.

By default, it used the MD5StringSplit class to partition the row keys into ranges. You can define your own algorithm by implementing the SplitAlgorithm interface provided, and handing it into the utility using the -D split.algorithm= parameter. An example of using the supplied split algorithm class and creating a presplit table is:

$ ./bin/hbase org.apache.hadoop.hbase.util.RegionSplitter \ -c 10 testtable -f colfam1

In the web UI of the master, you can click on the link with the newly created table name to see the generated regions:

testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093. testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a. testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92. testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258. testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9. testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5. testtable,4cccccc8,1309766006467.f0c5045c304c2ff5338be27e81ae698e. testtable,59999994,1309766006467.2d854f337aa6c09232409f0ba1d4964b. testtable,66666660,1309766006467.b1ec9df9fd90d91f54cb18da5edc2581. testtable,7333332c,1309766006468.42e179b78663b64401079a8601d9bd06.

Or you can use the shell’s create command:

hbase(main):001:0> create 'testtable', 'colfam1', \ { SPLITS => ['row-100', 'row-200', 'row-300', 'row-400'] }

0 row(s) in 1.1670 seconds

This generates the following regions:

testtable,,1309768272330.37377c4ab0a944a326ba8b6596a29396.

testtable,row-100,1309768272331.e6092cc777f58a08c61bf081aba14916.

testtable,row-200,1309768272331.63c9630a79b37ebce7b58cde0235dfe5.

testtable,row-300,1309768272331.eead6ad2ff3303ffe6a3126e0df3ff7a.

testtable,row-400,1309768272331.2bee7417fa67e4ac8c7210ce7325708e.

As for the number of presplit regions to use, you can start low with 10 presplit regions per server and watch as data grows over time. It is better to err on the side of too few regions and using a rolling split later, as having too many regions is usually not ideal in regard to overall cluster performance.

Alternatively, you can determine how many presplit regions to use based on the largest store file in your region: with a growing data size, this will get larger over time, and you want the largest region to be just big enough so that is not selected for major compaction—or you might face the mentioned compaction storms.

If you presplit your regions too thin, you can increase the major compaction interval by increasing the value for the hbase.hregion.majorcompaction configuration property. If your data size grows too large, use the RegionSplitter utility to perform a network I/O safe rolling split of all regions.

Use of manual splits and presplit regions is an advanced concept that requires a lot of planning and careful monitoring. On the other hand, it can help you to avoid the compaction storms that can happen for uniform data growth, or to shed load of hot regions by splitting them manually.

Hope this helps you understand "how to manually manage hbase regions splits?"

avatar

@Emily Sharpe, please review answers to your questions, and if they are acceptable, mark them as "accepted" so the responder can get credit. Otherwise, ask clarifying questions in comments so you can get your question answered. Thanks.

avatar
Rising Star

Thanks @Rushikesh Deshmukh for your response.

Using these, how would you recommend 'correcting' an existing store of data - compactions reduce the number of files per region, but how would you reduce the number of existing regions? Is this possible with the current status of merge tools?

avatar
Master Mentor

0.94 reference guide is old, try to refer to the current guide on HBase site.

avatar

@Emily Sharpe, If the original question is answered then please accept the best answer.