Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HBase backup

Explorer

Hi, Is there a release that support backup and restore strategy of HBase? I would like to take initiate incremental backups on tables in an HBase cluster. Please advise. Thanks, Ido

10 REPLIES 10

Explorer
Anyone?
Please?

Community Manager

Hello Idok,

 

There are several method to perform backups in HBase.   Here is a blog post from 2013 but it is still applicable:

 

Approaches to Backup and Disaster Recovery in HBase

 

To summarize the main backup methods:

 

  • snapshots - captures a copy of the table at a specific time.  Think of this as symlinks made to the Hbase files, except none of the snapshot files will be removed from HBase even if a split or compression occurs
  • replication - this requires a second cluster.  Incremental updates are sent to the replicated HBase table in the other cluster
  • export - copies the table into a file that can be stored on a directory in HDFS or to S3
  • copyTable - copies the table in HBase to another table in HBase.  It can also copy a table to another cluster.

Except for snapshots, these can perform incremental backups.  Snapshots are useful if you want to roll back your database to a specific point in time.

 

 

 



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Explorer

Hi David,

Thank you for the follow up.

 

Is there a plan to add "Backup and Restore" HBase feature to Cloudera?

http://hbase.apache.org/book.html#br.overview

 

Thanks,

Ido

 

 

Explorer

I have another +1 for this question.

http://hbase.apache.org/book.html#_backup_and_restore_commands

hbase backup create <type> <backup_path>

that's seems to be very useful, does Cloudera have any plans to implement this ? 

Explorer
Is there any update?

Community Manager

Hello,

 

The HBase backup code you mention is implemented in Apache HBase 3.0.0 and higher.  This upstream is where new development occurs, and while it has the latest code it is not always the safest code or the complete implementation.  At the present time, CDH 6.2 is based on HBase 2.1, which does not yet include this new functionality.  (See the bottom of this reply for documentation links)  

 

 

To answer your question on when this will be available from Cloudera, I do not have an answer.  We frequently backport compatible features from upstream.  While it seems having our HBase based on 2.1 is quite backdated, we frequently backport (cherry pick) functionality from higher versions in the 2.x stream.  For smaller features this can be easy.

 

My research into your question shows that the 'hbase backup' command availablility upstream is a small part of a large effort.  I would like to take this as an opportunity to describe what it can take to incorporate some product changes into our offerings.  Here are some things we would evaluate:

 

 

1.  Is the implementation fully complete?  It is being implemented in phases; HBASE-7912 (phase 1),HBASE-14123 (phase 2), HBASE-14414 (phase 3), HBASE-17362 (phase 4).  Development on phase 3 is nearly complete.

2.  Has it been been debugged?  HBASE-18886 is blocking phase 3 completion and phase 4 is in the early stages.

3.  Can it be automated?  HBASE-17517 provides for API access in phase 4.

4.  Since it requires filesystem storage, does it work with cloud storage?  I did not do a lot of research on this condition, but since HBase runs on HDFS and HDFS can work with cloud, this is likely a yes.

5.  Will it be backward-compatible?  We can't release something that would break existing customers except during a major release.

6.  Is it Enterprise-ready?  Once we backport or move to an upstream release with the code, we would perform extensive, large-scale testing.

 

You can view our Product Documentation on our releases, and documentation on our releases of open source software with our distributions on https://archive.cloudera.com/cdh6/6.2.0/docs/ (replace the cdh version with your desired version, use "cdh5" if you want to view the CDH 5 docs).  Our distribution HBase doc is the same as the Apache HBase doc for the equivalent release.

 

 

I hope this answer has been informative.



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Explorer

Hi @denloe ,

 

Thank you for the follow up.

 

You mentioned that HBase backup code is implemented in Apache HBase 3.0.0 and higher, but I see that HBase latest version is "2.2.0", so I'm not sure that I understand.

 

Please advise.

Thanks,

Ido

Explorer

Hi,

 

Is there any update?

 

Please advise.

Thanks,

Ido

Community Manager

@IdoK,

 

My experience is with the functionality currently offered with the HBase shipped in CDH.  Because of this I was researching your question by using the Apache HBase project Jira tracker.   Some of the Jiras reference a "Fix Version" of 3.0.0 and led to some confusion on my part in my previous answer.  You are correct that the 'backup' and 'restore' CLI commands are part of the 2.2.0 release and is functional.  To make it ready for enterprise testing, there are currently at least one bug (HBASE-18886) and two improvements (HBASE-18872,HBASE-18886) to be addressed by the Apache HBase project.

 

For these commands to be included:

  1. The supporting code for the commands still require work in the Apache HBase project.
  2. Cloudera will only re-base our distribution of HBase during a major release due to the risk of breaking backward compatibility.
  3. There are alternative existing backup methods that are enterprise-ready.

Personally, I would very much like to see the implementation completed but I do not anticipate they will be available anytime soon.



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Explorer

Hi @denloe 

 

Thank you very much for your detailed information.

You mentioned that there are alternative existing backup methods that are enterprise-ready.

But those alternative for incremental backups are performance-intensive.


Thanks,

Ido

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.