Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HIVE - backup and restore

avatar
Rising Star

We have a hive database in one cluster. I want to have a copy of that database in a different cluster.

So what is the process to do this?

Appreciate a detailed step by step reply.

7 REPLIES 7

avatar
Master Mentor

@n c

You can Dump Hive database on old cluster as mentioned in : "Hive Metastore Database Backup and Restore" section of the following doc: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-upgrade/content/getting-rea...

Then copy the DB dump file to new DB host machine and then manually edit dump file to point to new clusters NameNode location. Now create new Hive database on new DB host and then import the DB dump and use the same privileges schema as on previous database.

Before starting the Hive services upgrade the hive database by using schemaTool. You can use metatool to update the HDFS locations to the new cluster. Start the hive services.

avatar
Rising Star

I have a MySQL dump file (which is an ascii file filled with sql commands like create database, create table, insert etc) and the table files directly copied from hive directory (these I believe are the actual table data files from hive).

In the MYSQL dump file there are two places where the hostname of the source hivemetastore/hiveserver2/MySQL (all on same host) is mentioned. I don't see any mention of host which hosts the namenode name in this file.

So were you referring to the Hive host where you said :

"edit dump file to point to new clusters NameNode location"

I think we can simply point the file to the MYSQL and it will run as SQL commands and create the database, tables, data etc.

But after that how to get the data in from the hive data files?

Appreciate the insights.

avatar
Master Mentor

@n c

Hadoop DistCp (distributed copy) can be used to copy data between Hadoop clusters (and also within a Hadoop cluster). DistCp uses MapReduce to implement its distribution, error handling, and reporting. It expands a list of files and directories into map tasks, each of which copies a partition of the files specified in the source list.

- You can use Falcon to continuously sync Hive tables between clusters. It uses "distcp" under the cover for the data but keeps track of Partitions added etc. Including secure ones being in the same realm makes it easier. Some setup is needed in the configuration to add both clusters to the environment of the one running the "distcp".

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ch_distcp.html

Some information if your clusters are HA: http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/

And some info on Falcon Hive https://falcon.apache.org/HiveIntegration.html

.

avatar
Rising Star

I am aware of both distcp and falcon - but I am NOT talking about either of those.

I am talking about recovery/restore with a MySQL dump (hive database) and hive table files.

avatar
@n c

I hope your both clusters are having same version of hadoop.Other wise you may run into issues.

avatar
Expert Contributor

@n c

I see that you have mentioned :

" We have a hive database in one cluster. I want to have a copy of that database in a different cluster."

One way to do this is to take a back of the hive database in MySql.

1. Stop the Hive Services. This is done to make sure that there is no new metadata update to the MetaStore.

2. On the node running the MySql, do following

mysqldump hive > /backup_folder/hive_backup.sql 

3. Start the services of Hive Again.

On the other node, where you want to have the MySql backup to be restored, you need to install and configure MySql. Refer following document ( this is for HDP 2.5.6):

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.6/bk_command-line-installation/content/meet-m...

Once MySql is setup, you can create a database in MySql called "hive" :

mysql> create database hive;

Now dump the DB which was backed up earlier:

mysql hive < /backup_folder/hive_backup.sql 

You can use this Node to run another instance of Hive services, which can be effective as HA or simply use the MySql on the node to be as a Backup location.

Make a regular backup by mysqldump and restore it on the other node.

The other way of achieving your requirement could be to setup a HA for MySql database.

avatar
New Contributor

Is there a way to ensure that after the migration using mysqldump partitions can also be moved to the new metastore.

I have tried moving metastore from one environement to other but i dont see partitions and stats , any idea how can i address that?