- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Any blocking during HBase compaction?
- Labels:
-
Apache Ambari
-
Apache HBase
Created 04-20-2016 10:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably a real simple question, but I can't seem to find the answer.
What, if anything, is the impact on availability of a region during HBase maintenance tasks like major/minor compaction, region splits/merges, etc.
For example can we read/write to a region while it is doing a compaction, or will that get blocked until the operation has completed?
Thank you
Created 04-21-2016 01:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am pretty sure that it is not correct that HBAse is blocked during major compactions. I.e. I tried to find a definitive statement but didn't find any, however I am very sure that you can still read and write from a region during major compaction. However there will be a heavy impact on IO and CPU on the region servers as the storefiles are rewritten so they are normally scheduled during the night. If I am wrong on this one please clarify.
Region splits are essentially immediate since regions are logically split into two and will be rewritten during the next compactions. So there may be some impact but it should be very quick.
Region merging is an interesting question, I am not aware of the process for this.
Created 04-20-2016 10:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If a HBase table is undergoing major compaction client may encounter very low read/write throughput. Eventually clients may face connection timeout until major compaction is over.
In case of Minor compaction table is available for read and writes.
For more details refer this link
Created 09-29-2020 05:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Facing issues with region availability and it seems to be due to compactions. We are getting below exception when we try to access region org.apache.hadoop.hbase.NotServingRegionException: Region is not online But when we checked corresponding region server logs we can see lot of compactions happening on the table. Does table becomes unaccessible during compaction? Is there a way to reduce number of compactions through some setting?
Created 04-21-2016 01:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am pretty sure that it is not correct that HBAse is blocked during major compactions. I.e. I tried to find a definitive statement but didn't find any, however I am very sure that you can still read and write from a region during major compaction. However there will be a heavy impact on IO and CPU on the region servers as the storefiles are rewritten so they are normally scheduled during the night. If I am wrong on this one please clarify.
Region splits are essentially immediate since regions are logically split into two and will be rewritten during the next compactions. So there may be some impact but it should be very quick.
Region merging is an interesting question, I am not aware of the process for this.
Created 04-22-2016 06:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't seen the actual source code of major compaction. For all practical reasons i have not seen any hbase client able to perform any transaction during major compaction.
Created 04-25-2016 10:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is very weird. After all a minor compaction gets sometimes elevated to a major compaction. It would be pretty catastrophic if this would make HBase inaccessible. It is also never mentioned anywhere. I totally agree that there will be a performance impact of course.
http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/
- during flushes & compactions, HBase keeps processing put and get requests, always giving a consistent view of the data
Created 04-29-2016 07:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might be right. In my previous experiences with HBase (with high write throughput requirements) every time client timed out and were not able to establish connection back until major compaction was over.(To be precise connection was not blocked or lost as soon as major compaction started. But gradually connection died and client were not able to reconnect until major compaction was over). It might be a side effect.
Created 04-21-2016 02:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both for your replies. I too have been unable to find a definitive statement about availability of table/region during major compaction. I understand that there will be impact on IO/CPU and plan on scheduling major compactions on weekends (or other periods of lower activity), but for a 24/7 application, I need to understand if the application will be unavailable/blocked during the minutes(?) of compaction.
Created 04-21-2016 03:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am sure that there is no outage during a major compaction. Compactions are done on the store files while the old files still exist and then the files are switched out. I don;t think that basic process changes between minor and major compaction. The difference is that major compactions take all store files and remove deleted rows as well. So they have more impact on the cluster. Sometimes when all files are selected for a minor compaction he will do a major anyhow. So no unless an HBase commiter jumps in and tells me otherwise there is no outage during a major compaction.
http://www.slideshare.net/cloudera/hbasecon-2013-compaction-improvements-in-apache-hbase