Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Major compaction not running or not fully compacting regions

avatar
New Contributor

We are using CDH version 5.3.3 and some regions seem not to be performing major compaction as expected.

 

Most of our writes are being done by bulk import and minor compactions run as expected. Since we have frequent write operations, and in order to avoid running minor compactions frequently we have added the following attributes to the advanced configuration snippet (following the support provided by Cloudera):

 

<property>
<name>hbase.server.compactchecker.interval.multiplier</name>
<value>80</value>
<description>The number that determines how often we scan to see if compaction is necessary. Normally, compactions are done after some events (such as memstore flush), but if region didn't receive a lot of writes for some time, or due to different compaction policies, it may be necessary to check it periodically. The interval between checks is hbase.server.compactchecker.interval.multiplier multiplied by
hbase.server.thread.wakefrequency.</description>
</property>
<property>
<name>hbase.regionserver.thread.compaction.large</name>
<value>4</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.small</name>
<value>6</value>
</property>
<property>
<name>hbase.hstore.compaction.max.size</name>
<value>536870912</value>
</property>

 

Major compactions are scheduled to run every 7 days.

 

We currently have regions with a few dozen Store Files but we have a few problematic cases where we have more than 1000 files.

 

 

We would like to understand why this is happening and how can we prevent it. These are 4 main topics:

 

1) How can we understand why certain regions are not being properly performing Major compactions (is there any indication on the logs, etc)?

 

2) Which type of attributes should we monitor to ensure good HBase health (since currently the situation does not seem very healthy however HBase is not reporting anything in Cloudera Manager)?

 

3) We have a few regions which are more problematic since they contain more than 1000 store files (which account for a few TBs of data in a single region). Is it safe to manually perform Major compaction of such regions and if so do we need to have a large enough buffer in terms of storage to ensure the compaction is successful.

 

4) We assume data insertion can continue to happen while these large Major compaction activities are ongoing. Is this correct?

 

Thanks for any feedback or support

5 REPLIES 5

avatar
Master Collaborator

Well if you do a lot of tiny bulkloads, that could help explain the large amounts of files, especially if you don't pre-split tables prior to bulkloading.

 

Why do they stay there/ don't trigger a major compaction? I think we'd need to know file sizes/timestamps to speculate further.

trying to answer your questions:

1) How can we understand why certain regions are not being properly performing Major compactions (is there any indication on the logs, etc)? The best thing I've found so far is doing checks on the number of files in HDFS. we run compactions nightly during off peak, looking for regions with over X amount of hfiles and trigger a major compact for those regions. weekly we run the same full major compact. This helps keep on top of the trouble regions. 

 

2) Which type of attributes should we monitor to ensure good HBase health (since currently the situation does not seem very healthy however HBase is not reporting anything in Cloudera Manager)? Read/Write performance would seem to be the ultimate measure of health. We've had the best luck in monitoring this at the application side. This is really a giant topic and worthy of many blog posts. I'd recommend Lars Hofhansl's blog. It's almost a must read for HBase admins IMHO. http://hadoop-hbase.blogspot.com/

 

3) We have a few regions which are more problematic since they contain more than 1000 store files (which account for a few TBs of data in a single region). Is it safe to manually perform Major compaction of such regions and if so do we need to have a large enough buffer in terms of storage to ensure the compaction is successful. Depending on the filesizes you can go oom from this. To maximize chances of success, put the region on a regionserver all by it self and give it enough heap to store the resulting store file in memory. you will then likely trigger a few splits after compaction based on your max region size configuration.

 

4) We assume data insertion can continue to happen while these large Major compaction activities are ongoing. Is this correct? Yes, however, if there is ever more than hbase.hstore.blockingStoreFiles then writes are severly limited. I suspect that is the case for your 1000 storefile region, so you are already impacted. read here for more on compactions: http://hadoop-hbase.blogspot.com/2014/07/about-hbase-flushes-and-compactions.html

avatar
New Contributor

Thank you for your reply. It is much appreciated.


@ben.hemphill wrote:

Why do they stay there/ don't trigger a major compaction? I think we'd need to know file sizes/timestamps to speculate further.


Minor compactions are still happening so given our advanced configuration the store files are mostly slightly larger than 500Mb. Our region size is configured as 15 Gb so there is one large store file of a few Gb. Some of the store files have a timestamp several months old (clearly showing that Major compactions were not happening).

 

According to the documentation "hbase.hstore.compaction.max.size" should only affect minor compactions. Do you have any experience to confirm this fact?

 


@ben.hemphill wrote:

 

1) How can we understand why certain regions are not being properly performing Major compactions (is there any indication on the logs, etc)? The best thing I've found so far is doing checks on the number of files in HDFS. we run compactions nightly during off peak, looking for regions with over X amount of hfiles and trigger a major compact for those regions. weekly we run the same full major compact. This helps keep on top of the trouble regions. 

 


Is this a manual process or do you use some time of automation (configuration in HBase?)?

 


ben.hemphill wrote: 

3) We have a few regions which are more problematic since they contain more than 1000 store files (which account for a few TBs of data in a single region). Is it safe to manually perform Major compaction of such regions and if so do we need to have a large enough buffer in terms of storage to ensure the compaction is successful. Depending on the filesizes you can go oom from this. To maximize chances of success, put the region on a regionserver all by it self and give it enough heap to store the resulting store file in memory. you will then likely trigger a few splits after compaction based on your max region size configuration.


From our experience it seems the region split only happens after the Major compaction. Regarding the going oom, do you know if it tries to keep the full temporary store file (created during the compaction) in memory or just the store file being merged at each moment?

Given the sum of the store files in the most problematic regions (> TB) and given our region size configuration (15GB), the outcome of the major compaction would be something like 100 new regions. Is this something that a Major compaction can handle?

 

We are currently considering performing pre-split of the problematic regions before doing the compaction but we have no experience with such procedure. Any recommendation?

avatar
Master Collaborator

For the first two questions:

 

hbase.hstore.compaction.max.size is just a limit on the size of a file that hbase will consider for compaction.

 

The compaction script we wrote is run by cron on the masters. the script will check if the machine that it is running on is the current active master, if so, it continues, if not, it exits. this ensures we only get 1 thing scheduling compactions. this is NOT built into hbase. I will check if my company will allow us to opensource it. 


for the last one:

From our experience it seems the region split only happens after the Major compaction.  {This is correct, i was indicating that i would expect the region to spilt many times after you compacted}

Regarding the going oom, do you know if it tries to keep the full temporary store file (created during the compaction) in memory or just the store file being merged at each moment?  {stored in memory, compaction has to take care of all the deletes and merge all of the versions. Since you can't modify in HDFS (only append), this has to be stored in memory}

Given the sum of the store files in the most problematic regions (> TB) and given our region size configuration (15GB), the outcome of the major compaction would be something like 100 new regions. Is this something that a Major compaction can handle? {it would first write out the >TB region, then splits would be triggered as you point out, that is problematic since you likely don't have a machine with >TB of memory hanging around}

 

We are currently considering performing pre-split of the problematic regions before doing the compaction but we have no experience with such procedure. Any recommendation? {pre-split implies you create the appropriate amount of regions before loading data. now that you have all of this data in one region, there isn't any great, built in way to split the region}

 

what you could do to fix it at this point is export the table -> presplit a new table -> import into the new table. 

 

 

avatar
New Contributor

Thanks. The export/import idea is very interesting. We will further investigate this.

 

Regarding the max compaction size, do you think this could be the reason that explain why our scheduled Major compactions are not taking place (i.e. it is just ignoring the files larger than 500Mb so it thinks there is nothing to compact)?

avatar
Master Collaborator

Glad to help! 

 

Max compaction size config(hbase.hstore.compaction.max.size), *edit* looks like instead of the default you are setting that to 512MB. Yes that certainly is at least part of the issue. that effectively means that compaction will ignore any storefile larger than 512MB. I'm unsure what that will do to the ability to split when necessary. It's not something we set on our clusters. 

 

Leaving here for others:
If you are relying on hbase to do the weekly major compact(hbase.hregion.majorcompaction), there is a difference in behavior between a externally initiated compaction and a internal system one. The system initiated compaction(hbase.hregion.majorcompaction) seems to trigger only a minor compaction when over the max number of regions a minor will consider (hbase.hstore.compaction.max). I am guessing this is due to a desire to not impact the system with a very long running major compaction. In your case, you will be constantly triggering only a minor compaction of that many stores every time Hbase considers that region for compaction. (hbase.server.compactchecker.interval.multiplier multiplied by
hbase.server.thread.wakefrequency)  This is especially true if you generate more hfiles than (hbase.hstore.compaction.max) in the time it takes to do (hbase.server.compactchecker.interval.multiplier multiplied by
hbase.server.thread.wakefrequency + compact time).


Externally initiated compaction, either through hbase shell or through the API, sets the compaction priority to high and does not consider (hbase.hstore.compaction.max).