Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Backup Hive tables to external tapes

Rising Star



I have date-partitioned internal Hive tables and I want to backup year 2017 to external tapes (and then drop them). This is ~10TB of data.


What is the recommended way to copy this data out of HDFS?


Thank you,






There are different approaches. First let us see, the available options and their pros & cons then you can choose (or) if possible combine them as needed

1. Cloudera Manager -> Backup Menu option (or) distcp option

a. Easy to take backup

a. It will support between two different clusters, so it may not be suitable for your requirement

2. Export/Import Option.

Step1: Execute the below command from HDFS and export the working db.table to HDFS path and move to local as needed. It will export both data & metadata
> hive -S -e "export table $schema_file1.$tbl_file1 to '$HDFS_DATA_PATH/$tbl_file1';"

Step2: Run the below import command twice, First import will through an error as table doesn't exist and it will create table but the Second import will import the data too

> hive -S -e "import table $schema_file1.$tbl_file1 from '$HDFS_DATA_PATH/$tbl_file1';"

Note: You can hard code $ with actual path/file/table

a. Export/Import will take care of both data & metadata. so you don't need to handle metadata separately

a. I've used it long back for non-partitioned tables, not sure how it will support partition tables, pls double check
b. Need to apply import/export for each table


3. Move HDFS data to local and local to tape & take metadata backup separately. Ex: Mysql - so many links available online about how to take mysql backup



a. Metadatabackup is possible for entire db



a. May Need to take hdfs file by file depends upon your local FS capacity



There could be other options too, please update below if you/anyone find something


Rising Star

Thank you @saranvisa


Option 1 is not feasible.


Option 2 exports to HDFS. I do not have so much space left (10TB) to export and also I need to move it from HDFS to the tapes.


Option 3: what is 'local' in that case?


I assume that I need a workflow that will copy (that amount of) data from Hive (HDFS) to a remote location or a mount folder.  




Option 3: I meant linux File system as local 


Also just want to make sure the 10TB that you have mentioned is before replication (or) after replication?

Rising Star

Thank you @saranvisa


10TB is the data to move (and delete to free space)


2TB is the space currently left on each node.

Rising Star

Dear @saranvisa


Concerning you suggestion to export the table:


> hive -S -e "export table $schema_file1.$tbl_file1 to '$HDFS_DATA_PATH/$tbl_file1';"


I did export a partition of the table, and I saw that I got:


1. _metadata

2. a copy of the directory structure as in /user/hive/warehouse/myTable/thePartition


Since step [2] is too time consuming and generates a replication of the partition's data, is there another way to export only the _metadata, and copy the file directly from hive warehouse folder? Or do I miss something more with the export command?

Would you consider converting them to HAR file .
Below is the reference for further reading
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.