Hello,
My client is asking me a way to backup hive tables on tape. I know, this is not "big-data style". This is mandatory for them so I need to accomodate.
I found out a way to do this, but the procedure implies, when restoring, this procedure:
- create the table using the DDL previously backed up via "show create table" statement;
- mv the files to the warehouse dir/db/table just created;
- run msck repair table on that table.
The command works without error, however I found out that the original table has got about 111 million records, and the target only has got 37 millions.
I compared the hdfs size of the folder and they are the same.
I compared the number of partitions of the table and they are the same.
I tried to run msck repair once again (just in case), but the result doesn't change.
So I think the problem must be in the msck command: files are in place, but somehow it skips some in fixing.
What do you think ?
Bye
Omar