Reply
Explorer
Posts: 18
Registered: ‎09-20-2017

Backup Hive tables to external tapes

Hello,

 

I have date-partitioned internal Hive tables and I want to backup year 2017 to external tapes (and then drop them). This is ~10TB of data.

 

What is the recommended way to copy this data out of HDFS?

 

Thank you,

Gerasimos

Posts: 468
Topics: 14
Kudos: 77
Solutions: 41
Registered: ‎09-02-2016

Re: Backup Hive tables to external tapes

@gerasimos

 

There are different approaches. First let us see, the available options and their pros & cons then you can choose (or) if possible combine them as needed


1. Cloudera Manager -> Backup Menu option (or) distcp option

Pros:
a. Easy to take backup

Cons:
a. It will support between two different clusters, so it may not be suitable for your requirement


2. Export/Import Option.

Step1: Execute the below command from HDFS and export the working db.table to HDFS path and move to local as needed. It will export both data & metadata
> hive -S -e "export table $schema_file1.$tbl_file1 to '$HDFS_DATA_PATH/$tbl_file1';"


Step2: Run the below import command twice, First import will through an error as table doesn't exist and it will create table but the Second import will import the data too

> hive -S -e "import table $schema_file1.$tbl_file1 from '$HDFS_DATA_PATH/$tbl_file1';"

Note: You can hard code $ with actual path/file/table


Pros:
a. Export/Import will take care of both data & metadata. so you don't need to handle metadata separately

Cons:
a. I've used it long back for non-partitioned tables, not sure how it will support partition tables, pls double check
b. Need to apply import/export for each table

 

3. Move HDFS data to local and local to tape & take metadata backup separately. Ex: Mysql - so many links available online about how to take mysql backup

 

pros:

a. Metadatabackup is possible for entire db

 

Cons:

a. May Need to take hdfs file by file depends upon your local FS capacity

 

 

There could be other options too, please update below if you/anyone find something

 

Explorer
Posts: 18
Registered: ‎09-20-2017

Re: Backup Hive tables to external tapes

Thank you @saranvisa

 

Option 1 is not feasible.

 

Option 2 exports to HDFS. I do not have so much space left (10TB) to export and also I need to move it from HDFS to the tapes.

 

Option 3: what is 'local' in that case?

 

I assume that I need a workflow that will copy (that amount of) data from Hive (HDFS) to a remote location or a mount folder.  

Champion
Posts: 731
Registered: ‎05-16-2016

Re: Backup Hive tables to external tapes

Would you consider converting them to HAR file .
Below is the reference for further reading

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Archiving
Posts: 468
Topics: 14
Kudos: 77
Solutions: 41
Registered: ‎09-02-2016

Re: Backup Hive tables to external tapes

@gerasimos

 

Option 3: I meant linux File system as local 

 

Also just want to make sure the 10TB that you have mentioned is before replication (or) after replication?

Explorer
Posts: 18
Registered: ‎09-20-2017

Re: Backup Hive tables to external tapes

Thank you @saranvisa

 

10TB is the data to move (and delete to free space)

 

2TB is the space currently left on each node.

Announcements