04-11-2019 01:33 PM - last edited on 04-12-2019 05:52 AM by cjervis
Hi, Is there a release that support backup and restore strategy of HBase? I would like to take initiate incremental backups on tables in an HBase cluster. Please advise. Thanks, Ido
04-17-2019 09:27 AM
There are several method to perform backups in HBase. Here is a blog post from 2013 but it is still applicable:
To summarize the main backup methods:
Except for snapshots, these can perform incremental backups. Snapshots are useful if you want to roll back your database to a specific point in time.
04-22-2019 05:45 AM
I have another +1 for this question.
hbase backup create <type> <backup_path>
that's seems to be very useful, does Cloudera have any plans to implement this ?
05-15-2019 01:29 PM
The HBase backup code you mention is implemented in Apache HBase 3.0.0 and higher. This upstream is where new development occurs, and while it has the latest code it is not always the safest code or the complete implementation. At the present time, CDH 6.2 is based on HBase 2.1, which does not yet include this new functionality. (See the bottom of this reply for documentation links)
To answer your question on when this will be available from Cloudera, I do not have an answer. We frequently backport compatible features from upstream. While it seems having our HBase based on 2.1 is quite backdated, we frequently backport (cherry pick) functionality from higher versions in the 2.x stream. For smaller features this can be easy.
My research into your question shows that the 'hbase backup' command availablility upstream is a small part of a large effort. I would like to take this as an opportunity to describe what it can take to incorporate some product changes into our offerings. Here are some things we would evaluate:
1. Is the implementation fully complete? It is being implemented in phases; HBASE-7912 (phase 1),HBASE-14123 (phase 2), HBASE-14414 (phase 3), HBASE-17362 (phase 4). Development on phase 3 is nearly complete.
2. Has it been been debugged? HBASE-18886 is blocking phase 3 completion and phase 4 is in the early stages.
3. Can it be automated? HBASE-17517 provides for API access in phase 4.
4. Since it requires filesystem storage, does it work with cloud storage? I did not do a lot of research on this condition, but since HBase runs on HDFS and HDFS can work with cloud, this is likely a yes.
5. Will it be backward-compatible? We can't release something that would break existing customers except during a major release.
6. Is it Enterprise-ready? Once we backport or move to an upstream release with the code, we would perform extensive, large-scale testing.
You can view our Product Documentation on our releases, and documentation on our releases of open source software with our distributions on https://archive.cloudera.com/cdh6/6.2.0/docs/ (replace the cdh version with your desired version, use "cdh5" if you want to view the CDH 5 docs). Our distribution HBase doc is the same as the Apache HBase doc for the equivalent release.
I hope this answer has been informative.