Hi, Is there a release that support backup and restore strategy of HBase? I would like to take initiate incremental backups on tables in an HBase cluster. Please advise. Thanks, Ido
There are several method to perform backups in HBase. Here is a blog post from 2013 but it is still applicable:
To summarize the main backup methods:
Except for snapshots, these can perform incremental backups. Snapshots are useful if you want to roll back your database to a specific point in time.
I have another +1 for this question.
hbase backup create <type> <backup_path>
that's seems to be very useful, does Cloudera have any plans to implement this ?
The HBase backup code you mention is implemented in Apache HBase 3.0.0 and higher. This upstream is where new development occurs, and while it has the latest code it is not always the safest code or the complete implementation. At the present time, CDH 6.2 is based on HBase 2.1, which does not yet include this new functionality. (See the bottom of this reply for documentation links)
To answer your question on when this will be available from Cloudera, I do not have an answer. We frequently backport compatible features from upstream. While it seems having our HBase based on 2.1 is quite backdated, we frequently backport (cherry pick) functionality from higher versions in the 2.x stream. For smaller features this can be easy.
My research into your question shows that the 'hbase backup' command availablility upstream is a small part of a large effort. I would like to take this as an opportunity to describe what it can take to incorporate some product changes into our offerings. Here are some things we would evaluate:
1. Is the implementation fully complete? It is being implemented in phases; HBASE-7912 (phase 1),HBASE-14123 (phase 2), HBASE-14414 (phase 3), HBASE-17362 (phase 4). Development on phase 3 is nearly complete.
2. Has it been been debugged? HBASE-18886 is blocking phase 3 completion and phase 4 is in the early stages.
3. Can it be automated? HBASE-17517 provides for API access in phase 4.
4. Since it requires filesystem storage, does it work with cloud storage? I did not do a lot of research on this condition, but since HBase runs on HDFS and HDFS can work with cloud, this is likely a yes.
5. Will it be backward-compatible? We can't release something that would break existing customers except during a major release.
6. Is it Enterprise-ready? Once we backport or move to an upstream release with the code, we would perform extensive, large-scale testing.
You can view our Product Documentation on our releases, and documentation on our releases of open source software with our distributions on https://archive.cloudera.com/cdh6/6.2.0/docs/ (replace the cdh version with your desired version, use "cdh5" if you want to view the CDH 5 docs). Our distribution HBase doc is the same as the Apache HBase doc for the equivalent release.
I hope this answer has been informative.
Hi @denloe ,
Thank you for the follow up.
You mentioned that HBase backup code is implemented in Apache HBase 3.0.0 and higher, but I see that HBase latest version is "2.2.0", so I'm not sure that I understand.
My experience is with the functionality currently offered with the HBase shipped in CDH. Because of this I was researching your question by using the Apache HBase project Jira tracker. Some of the Jiras reference a "Fix Version" of 3.0.0 and led to some confusion on my part in my previous answer. You are correct that the 'backup' and 'restore' CLI commands are part of the 2.2.0 release and is functional. To make it ready for enterprise testing, there are currently at least one bug (HBASE-18886) and two improvements (HBASE-18872,HBASE-18886) to be addressed by the Apache HBase project.
For these commands to be included:
Personally, I would very much like to see the implementation completed but I do not anticipate they will be available anytime soon.