Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase Table dump to flat files

avatar
New Contributor

Hello,

I want to backup hbase tables to flat files but with the below command i can only backup single table. Is there a best way to backup all the tables with single command to flat files ?

hbase org.apache.hadoop.hbase.mapreduce.Export "hbase:meta" "<output_dir>"

1 ACCEPTED SOLUTION

avatar
Guru

@Anurag Ramayanapu

There are multiple different ways that one can use to get data out of HBase for backup or other purposes.

1. Export / Import:

Export tool will export the data using a MR job to sequence files living in any Hadoop-compatible file system. Later, Import tool can be used to import the data back into HBase.

More information can be found here: https://hbase.apache.org/book.html#_export

2. Snapshot + ExportSnapshot, ImportSnapshot:

Taking a snapshot of an HBase table is a lightweight operation which saves references to actual data files included in the snapshot. After taking a snapshot, one can export the snapshot files to any Hadoop-compatible file system. Exported Snapshot files are in the HBase-native file formats. Snapshot operates at the table level.

More information here: https://hbase.apache.org/book.html#ops.snapshots.e...

3. CopyTable:

CopyTable can be used to do a live table scan of the table data and insert into another HBase cluster. Notice that copy table needs a live HBase cluster at the sink. This is useful for multi-DC, DR setups.

More information here: https://hbase.apache.org/book.html#_copytable

4. Custom MR job + BulkLoad:

A custom MR can be written to export the data out of a live cluster. The MR job can use any encoding + output format that is desired (for example TSV, or HFiles). If HFiles are generated using the LoadIncrementalHFiles tool, the data then later can be bulk loaded into a live HBase cluster. If CVS or TSV, ImportTsv tool can be used

More information here: https://hbase.apache.org/book.html#mapreduce and https://hbase.apache.org/book.html#_completebulklo...

5. Backup Tool:

A native backup tool is in the works that defines a backup command, file formats, and utilities to manage multi-table, full and incremental backups. We are (hopefully) very close to commit the backup patches and make it available in HDP-2.3 series soon.

More information here: https://issues.apache.org/jira/browse/HBASE-7912 and https://issues.apache.org/jira/browse/HBASE-14030

Except for backups, the other solutions all work at the table level since in most of the cases, different strategies are needed for different tables. However, it should be possible to write a simple script / tool to get a list of all tables from master and invoke the corresponding tool for each table relatively easily.

View solution in original post

2 REPLIES 2

avatar
Guru

@Anurag Ramayanapu

There are multiple different ways that one can use to get data out of HBase for backup or other purposes.

1. Export / Import:

Export tool will export the data using a MR job to sequence files living in any Hadoop-compatible file system. Later, Import tool can be used to import the data back into HBase.

More information can be found here: https://hbase.apache.org/book.html#_export

2. Snapshot + ExportSnapshot, ImportSnapshot:

Taking a snapshot of an HBase table is a lightweight operation which saves references to actual data files included in the snapshot. After taking a snapshot, one can export the snapshot files to any Hadoop-compatible file system. Exported Snapshot files are in the HBase-native file formats. Snapshot operates at the table level.

More information here: https://hbase.apache.org/book.html#ops.snapshots.e...

3. CopyTable:

CopyTable can be used to do a live table scan of the table data and insert into another HBase cluster. Notice that copy table needs a live HBase cluster at the sink. This is useful for multi-DC, DR setups.

More information here: https://hbase.apache.org/book.html#_copytable

4. Custom MR job + BulkLoad:

A custom MR can be written to export the data out of a live cluster. The MR job can use any encoding + output format that is desired (for example TSV, or HFiles). If HFiles are generated using the LoadIncrementalHFiles tool, the data then later can be bulk loaded into a live HBase cluster. If CVS or TSV, ImportTsv tool can be used

More information here: https://hbase.apache.org/book.html#mapreduce and https://hbase.apache.org/book.html#_completebulklo...

5. Backup Tool:

A native backup tool is in the works that defines a backup command, file formats, and utilities to manage multi-table, full and incremental backups. We are (hopefully) very close to commit the backup patches and make it available in HDP-2.3 series soon.

More information here: https://issues.apache.org/jira/browse/HBASE-7912 and https://issues.apache.org/jira/browse/HBASE-14030

Except for backups, the other solutions all work at the table level since in most of the cases, different strategies are needed for different tables. However, it should be possible to write a simple script / tool to get a list of all tables from master and invoke the corresponding tool for each table relatively easily.

avatar
New Contributor

Hello Enis,

Thank you so much for taking time to reply me. Is there any custom sample script that you can share to do automated table dumps through export command ?