Created on 04-11-2019 01:33 PM - edited 09-16-2022 07:18 AM
Hi, Is there a release that support backup and restore strategy of HBase? I would like to take initiate incremental backups on tables in an HBase cluster. Please advise. Thanks, Ido
Created 04-17-2019 03:10 AM
Created 04-17-2019 09:27 AM
Hello Idok,
There are several method to perform backups in HBase. Here is a blog post from 2013 but it is still applicable:
Approaches to Backup and Disaster Recovery in HBase
To summarize the main backup methods:
Except for snapshots, these can perform incremental backups. Snapshots are useful if you want to roll back your database to a specific point in time.
David Wilder, Community Manager
Created 04-21-2019 02:39 AM
Hi David,
Thank you for the follow up.
Is there a plan to add "Backup and Restore" HBase feature to Cloudera?
http://hbase.apache.org/book.html#br.overview
Thanks,
Ido
Created 04-22-2019 05:45 AM
I have another +1 for this question.
http://hbase.apache.org/book.html#_backup_and_restore_commands
hbase backup create <type> <backup_path>
that's seems to be very useful, does Cloudera have any plans to implement this ?
Created 05-14-2019 11:25 PM
Created 05-15-2019 01:29 PM
Hello,
The HBase backup code you mention is implemented in Apache HBase 3.0.0 and higher. This upstream is where new development occurs, and while it has the latest code it is not always the safest code or the complete implementation. At the present time, CDH 6.2 is based on HBase 2.1, which does not yet include this new functionality. (See the bottom of this reply for documentation links)
To answer your question on when this will be available from Cloudera, I do not have an answer. We frequently backport compatible features from upstream. While it seems having our HBase based on 2.1 is quite backdated, we frequently backport (cherry pick) functionality from higher versions in the 2.x stream. For smaller features this can be easy.
My research into your question shows that the 'hbase backup' command availablility upstream is a small part of a large effort. I would like to take this as an opportunity to describe what it can take to incorporate some product changes into our offerings. Here are some things we would evaluate:
1. Is the implementation fully complete? It is being implemented in phases; HBASE-7912 (phase 1),HBASE-14123 (phase 2), HBASE-14414 (phase 3), HBASE-17362 (phase 4). Development on phase 3 is nearly complete.
2. Has it been been debugged? HBASE-18886 is blocking phase 3 completion and phase 4 is in the early stages.
3. Can it be automated? HBASE-17517 provides for API access in phase 4.
4. Since it requires filesystem storage, does it work with cloud storage? I did not do a lot of research on this condition, but since HBase runs on HDFS and HDFS can work with cloud, this is likely a yes.
5. Will it be backward-compatible? We can't release something that would break existing customers except during a major release.
6. Is it Enterprise-ready? Once we backport or move to an upstream release with the code, we would perform extensive, large-scale testing.
You can view our Product Documentation on our releases, and documentation on our releases of open source software with our distributions on https://archive.cloudera.com/cdh6/6.2.0/docs/ (replace the cdh version with your desired version, use "cdh5" if you want to view the CDH 5 docs). Our distribution HBase doc is the same as the Apache HBase doc for the equivalent release.
I hope this answer has been informative.
David Wilder, Community Manager
Created 08-22-2019 02:28 AM
Hi @denloe ,
Thank you for the follow up.
You mentioned that HBase backup code is implemented in Apache HBase 3.0.0 and higher, but I see that HBase latest version is "2.2.0", so I'm not sure that I understand.
Please advise.
Thanks,
Ido
Created 08-25-2019 08:50 PM
Hi,
Is there any update?
Please advise.
Thanks,
Ido
Created 09-04-2019 01:46 PM
My experience is with the functionality currently offered with the HBase shipped in CDH. Because of this I was researching your question by using the Apache HBase project Jira tracker. Some of the Jiras reference a "Fix Version" of 3.0.0 and led to some confusion on my part in my previous answer. You are correct that the 'backup' and 'restore' CLI commands are part of the 2.2.0 release and is functional. To make it ready for enterprise testing, there are currently at least one bug (HBASE-18886) and two improvements (HBASE-18872,HBASE-18886) to be addressed by the Apache HBase project.
For these commands to be included:
Personally, I would very much like to see the implementation completed but I do not anticipate they will be available anytime soon.
David Wilder, Community Manager
Created 09-04-2019 10:31 PM
Hi @denloe
Thank you very much for your detailed information.
You mentioned that there are alternative existing backup methods that are enterprise-ready.
But those alternative for incremental backups are performance-intensive.
Thanks,
Ido