Support Questions
Find answers, ask questions, and share your expertise

Which is best method for taking backup of hbase data?

Hi,

Can anyone suggest me which is best method for taking backup of hbase data among distcp, copyTable, export/import, cluster replication?

1 ACCEPTED SOLUTION

Hi @Rushikesh Deshmukh, for a list of backup options check this. CopyTable is a nice option, using multiple mappers, you can copy individual tables to the same or another cluster. You can miss a few edits but you will end up with a useful copy.

View solution in original post

12 REPLIES 12

You can use snapshots to take backup https://hbase.apache.org/book.html#ops.snapshots

@Rajeshbabu Chintaguntla, can you please explain what are advantages of using snapshots over other methods and also provide command to be used for taking snapshots?

@Rajeshbabu Chintaguntla, thanks for sharing this info. and link.

Preference wise (as Impact on running cluster will also be very less):-

cluster replication:- If requirement is to recover in realtime and new cluster can be afforded.

export snapshot:- if recovery to last taken snapshot is fine and cost of this approach is less as you can export it to any cheap storage(hdfs,s3 or anything). But with this incremental backup will not be possible, old backups will become obsolete with the new.

@ asinghal, thanks for sharing this useful information.

Contributor

Hi Rushikesh,

Suggest you to go through this article might be helpful:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hbase_snapshots_guide/content/ch_hbase_s...

Regards,

Karthik Gopal

Contributor

HBase Snapshots allow you to take a snapshot of a table without much impact on Region Servers. Snapshot, clone, and restore operations don't involve data copying. In addition, exporting a snapshot to another cluster has no impact on region servers.

@Karthik Gopal

thanks for reply, found some useful information on given link. Does hbase_snapshots is best method for backup on live environment?

Hi @Rushikesh Deshmukh, for a list of backup options check this. CopyTable is a nice option, using multiple mappers, you can copy individual tables to the same or another cluster. You can miss a few edits but you will end up with a useful copy.

@Predrag Minovic, thanks for sharing this useful information.

Expert Contributor

Hi @Rushikesh Deshmukh

The following table provides an overview for quickly comparing these approaches, which I’ll describe in detail below.

http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/

i used distcp as well but that did not work for me , in the sense data was copied but while running hbck i had issue

if you want to create backup on same cluster then copytable and sanpshot are very easy

for inter cluster snapshot works good

let me know if you need more details


hbase-data-backup.png

Expert Contributor

Hi @Rushikesh Deshmukh

The following table provides an overview for quickly comparing these approaches, which I’ll describe in detail below.

http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/

i used distcp as well but that did not work for me , in the sense data was copied but while running hbck i had issue

if you want to create backup on same cluster then copytable and sanpshot are very easy

for inter cluster snapshot works good

let me know if you need more details

Also this below link is really very useful and clear

http://hbase.apache.org/0.94/book/ops.backup.html

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.