I would like to back up and restore HBase table snapshots to a non hdfs location outside of the cluster.
I am able to use ExportSnapshot to copy the table data outside the cluster:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapname> -copy-to<bu-appliance-URI>
but cannot use
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapname> -copy-from <bu-appliance-URI> -copy-to <hdfs://hbase>
because the backup appliance URI is not hdfs://
I can use DistCp to copy from <bu-appliance-URI> to hdfs://staging-dir then use ExportSnapshot to bring it back into hbase, but we want to avoid the need for a staging directory.
I did some experiments and found I could distcp from <bu-appliance-URI>/.hbase-snapshot to hdfs://hbase and another distcp from <bu-appliance-UIR>/archive to hdfs://hbase and restore the table from the snapshot.
Is this a valid approach? If not can you suggest something better?
Someone has suggested using the RestoreSnapshotHelper class, is this capable of restoring an exported snapshot from an external filesystem back to hbase if the table has been deleted?
Hi, that isn't how ExportSnapshot is typically used. It has only been tested with hdfs and S3 as a source/target. But ExportSnapshot does use distcp underneath the covers. Have you completely verified the data after the restore? Also, what protocol are you trying to use when copying the table to/from the appliance using ExportSnapshot? What is the URI that you are using? Thanks.
Hi, thanks for responding!
I know it is "typically" used to move tables from one hdfs filessytem to another.
But I need to move a complete table outside of hdfs to a filesystem that we are working on and back again later, implemented using the Hadoop compatible filesystem class API:
This should be the same type of thing that is done with the S3 interface to HDFS It works fine for DistCp and appears to work ok for ExportSnapshot as well, at least I was able to delete the snapshot and drop the table, and it appeared to restore ok. Is this similar to what was tried with an S3 HBase backup/restore?
Is there any specific tests you suggest we should do?
How did you test the S3 export to convince yourself that the data is correct?
How did you import the table from S3 filesystem back to hdfs?
Do you see any potential problems with this approach assuming we correctly implemented our filesystem class and it works with DistCp?
As far as testing for data integrity, I'm not sure how they did it for the S3 test, but if you aren't modifying it, you can use the verifyrep job if you re-import the table to a different table name (make sure to keep the same splits).
One problem, can ExportSnapshot be passed an external .jar and library files using -files and -libjars to support the external filesystem as can be done with DistCp?
If not, is there any other way to make ExportSnapshot work if the DistCp operation needs an external .jar and libraries?