Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Any blocking during HBase compaction?

avatar

Probably a real simple question, but I can't seem to find the answer.

What, if anything, is the impact on availability of a region during HBase maintenance tasks like major/minor compaction, region splits/merges, etc.

For example can we read/write to a region while it is doing a compaction, or will that get blocked until the operation has completed?

Thank you

1 ACCEPTED SOLUTION

avatar
Master Guru

I am pretty sure that it is not correct that HBAse is blocked during major compactions. I.e. I tried to find a definitive statement but didn't find any, however I am very sure that you can still read and write from a region during major compaction. However there will be a heavy impact on IO and CPU on the region servers as the storefiles are rewritten so they are normally scheduled during the night. If I am wrong on this one please clarify.

Region splits are essentially immediate since regions are logically split into two and will be rewritten during the next compactions. So there may be some impact but it should be very quick.

Region merging is an interesting question, I am not aware of the process for this.

View solution in original post

8 REPLIES 8

avatar
Expert Contributor

If a HBase table is undergoing major compaction client may encounter very low read/write throughput. Eventually clients may face connection timeout until major compaction is over.

In case of Minor compaction table is available for read and writes.

For more details refer this link

avatar
New Contributor

Facing issues with region availability and it seems to be due to compactions. We are getting below exception when we try to access region org.apache.hadoop.hbase.NotServingRegionException: Region is not online But when we checked corresponding region server logs we can see lot of compactions happening on the table. Does table becomes unaccessible during compaction? Is there a way to reduce number of compactions through some setting?

avatar
Master Guru

I am pretty sure that it is not correct that HBAse is blocked during major compactions. I.e. I tried to find a definitive statement but didn't find any, however I am very sure that you can still read and write from a region during major compaction. However there will be a heavy impact on IO and CPU on the region servers as the storefiles are rewritten so they are normally scheduled during the night. If I am wrong on this one please clarify.

Region splits are essentially immediate since regions are logically split into two and will be rewritten during the next compactions. So there may be some impact but it should be very quick.

Region merging is an interesting question, I am not aware of the process for this.

avatar
Expert Contributor

I haven't seen the actual source code of major compaction. For all practical reasons i have not seen any hbase client able to perform any transaction during major compaction.

avatar
Master Guru

That is very weird. After all a minor compaction gets sometimes elevated to a major compaction. It would be pretty catastrophic if this would make HBase inaccessible. It is also never mentioned anywhere. I totally agree that there will be a performance impact of course.

http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/

  • during flushes & compactions, HBase keeps processing put and get requests, always giving a consistent view of the data

avatar
Expert Contributor

You might be right. In my previous experiences with HBase (with high write throughput requirements) every time client timed out and were not able to establish connection back until major compaction was over.(To be precise connection was not blocked or lost as soon as major compaction started. But gradually connection died and client were not able to reconnect until major compaction was over). It might be a side effect.

avatar

Thank you both for your replies. I too have been unable to find a definitive statement about availability of table/region during major compaction. I understand that there will be impact on IO/CPU and plan on scheduling major compactions on weekends (or other periods of lower activity), but for a 24/7 application, I need to understand if the application will be unavailable/blocked during the minutes(?) of compaction.

avatar
Master Guru

I am sure that there is no outage during a major compaction. Compactions are done on the store files while the old files still exist and then the files are switched out. I don;t think that basic process changes between minor and major compaction. The difference is that major compactions take all store files and remove deleted rows as well. So they have more impact on the cluster. Sometimes when all files are selected for a minor compaction he will do a major anyhow. So no unless an HBase commiter jumps in and tells me otherwise there is no outage during a major compaction.

http://www.slideshare.net/cloudera/hbasecon-2013-compaction-improvements-in-apache-hbase