We have a hive database in one cluster. I want to have a copy of that database in a different cluster.
So what is the process to do this?
Appreciate a detailed step by step reply.
You can Dump Hive database on old cluster as mentioned in : "Hive Metastore Database Backup and Restore" section of the following doc: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-upgrade/content/getting-rea...
Then copy the DB dump file to new DB host machine and then manually edit dump file to point to new clusters NameNode location. Now create new Hive database on new DB host and then import the DB dump and use the same privileges schema as on previous database.
Before starting the Hive services upgrade the hive database by using schemaTool. You can use metatool to update the HDFS locations to the new cluster. Start the hive services.
I have a MySQL dump file (which is an ascii file filled with sql commands like create database, create table, insert etc) and the table files directly copied from hive directory (these I believe are the actual table data files from hive).
In the MYSQL dump file there are two places where the hostname of the source hivemetastore/hiveserver2/MySQL (all on same host) is mentioned. I don't see any mention of host which hosts the namenode name in this file.
So were you referring to the Hive host where you said :
"edit dump file to point to new clusters NameNode location"
I think we can simply point the file to the MYSQL and it will run as SQL commands and create the database, tables, data etc.
But after that how to get the data in from the hive data files?
Appreciate the insights.
Hadoop DistCp (distributed copy) can be used to copy data between Hadoop clusters (and also within a Hadoop cluster). DistCp uses MapReduce to implement its distribution, error handling, and reporting. It expands a list of files and directories into map tasks, each of which copies a partition of the files specified in the source list.
- You can use Falcon to continuously sync Hive tables between clusters. It uses "distcp" under the cover for the data but keeps track of Partitions added etc. Including secure ones being in the same realm makes it easier. Some setup is needed in the configuration to add both clusters to the environment of the one running the "distcp".
Some information if your clusters are HA: http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/
And some info on Falcon Hive https://falcon.apache.org/HiveIntegration.html
I am aware of both distcp and falcon - but I am NOT talking about either of those.
I am talking about recovery/restore with a MySQL dump (hive database) and hive table files.
I see that you have mentioned :
" We have a hive database in one cluster. I want to have a copy of that database in a different cluster."
One way to do this is to take a back of the hive database in MySql.
1. Stop the Hive Services. This is done to make sure that there is no new metadata update to the MetaStore.
2. On the node running the MySql, do following
mysqldump hive > /backup_folder/hive_backup.sql
3. Start the services of Hive Again.
On the other node, where you want to have the MySql backup to be restored, you need to install and configure MySql. Refer following document ( this is for HDP 2.5.6):
Once MySql is setup, you can create a database in MySql called "hive" :
mysql> create database hive;
Now dump the DB which was backed up earlier:
mysql hive < /backup_folder/hive_backup.sql
You can use this Node to run another instance of Hive services, which can be effective as HA or simply use the MySql on the node to be as a Backup location.
Make a regular backup by mysqldump and restore it on the other node.
The other way of achieving your requirement could be to setup a HA for MySql database.
Is there a way to ensure that after the migration using mysqldump partitions can also be moved to the new metastore.
I have tried moving metastore from one environement to other but i dont see partitions and stats , any idea how can i address that?