Support Questions

Find answers, ask questions, and share your expertise

Hbase export snapshot issue for a table

avatar
Explorer

I'm exporting hbase tables from one cluster to another one using hbase export snapshot command in terminal. It works well for the majority of the tables i have, however it doesn't work for one specific table (which is the largest one). All the other tables have a size of Giga bytes, but the one table i have issues with have a size of Tera bytes (TB). The conclusion i come up is that, there might be a size limit for it, but i need a more convincing reason, or an idea about other alternative methods i can transfer this table to the new cluster.

The steps i'm following when transferring the tables are -

1, take the snapshot of a table in the old cluster , snapshot 'table_name'

2, use hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot-snapshot -snapshot snapshot_name -copy-to hdfs://ip-address//apps/hbase/data -mappers 16

3, on the new cluster, restore_snapshot 'table_name'

i would appreciate it if i get a quick reply, since it's very urgent matter.

6 REPLIES 6

avatar
Expert Contributor

what do you mean when you say it doesn't work? what's the error?

avatar
Explorer

I mean, it doesn't show any error. The export job for the table completes as for any other table, however, when you type list_snapshots on the hbase shell of the new cluster, it doesn't appear with the list of snapshots unlike all the other tables.

avatar
Explorer

I mean, it doesn't show any error. The export job for the table completes as for any other table, however, when you type list_snapshots on the hbase shell of the new cluster, it doesn't appear with the list of snapshots unlike all the other tables.

avatar
Expert Contributor

can you confirm the owner of copied directory in destination cluster? if it is different, you may want to change it. Also check the copied data on destination cluster. the data size, the metadata file (snapshot files) and if data is present or not. sometime it is there but in different folder, probably some tmp or hidden folder, move it if that is the case

avatar
Explorer

$ hdfs dfs -ls /apps/hbase/data/archive/data/default

Found 19 items

drwxr-xr-x - hbase hdfs 0 2017-11-26 17:15 /apps/hbase/data/archive/data/default/change_requests

drwxr-xr-x - hbase hdfs 0 2017-12-07 14:40 /apps/hbase/data/archive/data/default/companies

drwxr-xr-x - hbase hdfs 0 2017-11-26 17:05 /apps/hbase/data/archive/data/default/company_news_batched

$ hdfs dfs -ls /apps/hbase/data/data/default

Found 18 items

drwxr-xr-x - hbase hdfs 0 2017-11-26 17:17 /apps/hbase/data/data/default/change_requests

drwxr-xr-x - hbase hdfs 0 2017-11-26 17:10 /apps/hbase/data/data/default/company_news_batched

hbase(main):001:0> list_snapshots

SNAPSHOT TABLE + CREATION TIME

change_requests-snapshot change_requests (Sun Nov 26 17:14:03 +0000 2017)

company_news_batched-snapshot company_news_batched (Sun Nov 26 17:03:51 +0000 2017)

As you see on the lists of the above commands the companies table exist in the archive folder but if i want to restore it from archive, using the restore_snapshot 'companies' command, it's not possible, since i can't find it in the snapshot list like the other tables.

So, the data is there in the archive, but how can i move it from there to the normal /hbase/data/ folder? last time i did that(moving from the archive directory to the data directory) it created lot's of complications.

avatar
Expert Contributor

good that the data is there. here's the next things you can do:

1. check table size on src cluster and snapshot size on dest cluster [hadoop dfs -du-h /apps/hbase/data/<table> or something like that]

2. compare files in snapshots that are getting listed and one which is not.

What happens during copying snapshot is first data gets copied into some temp dir on dest cluster and then it gets moved to actual folder. If you see some files missing [metadata file. with .snapshot extension probably], try to search for them in other places .tmp folder as far as I remember, if you happen to find them, move it to table snapshot folder.

I don't exactly remember all the dir names and locations so you will have to explore around a bit