Then copy the DB dump file to new DB host machine and then manually edit dump file to point to new clusters NameNode location. Now create new Hive database on new DB host and then import the DB dump and use the same privileges schema as on previous database.
Before starting the Hive services upgrade the hive database by using schemaTool. You can use metatool to update the HDFS locations to the new cluster. Start the hive services.
I have a MySQL dump file (which is an ascii file filled with sql commands like create database, create table, insert etc) and the table files directly copied from hive directory (these I believe are the actual table data files from hive).
In the MYSQL dump file there are two places where the hostname of the source hivemetastore/hiveserver2/MySQL (all on same host) is mentioned. I don't see any mention of host which hosts the namenode name in this file.
So were you referring to the Hive host where you said :
"edit dump file to point to new clusters NameNode location"
I think we can simply point the file to the MYSQL and it will run as SQL commands and create the database, tables, data etc.
But after that how to get the data in from the hive data files?
Hadoop DistCp (distributed copy) can be used to copy data between Hadoop clusters (and also within a Hadoop cluster). DistCp uses MapReduce to implement its distribution, error handling, and reporting. It expands a list of files and directories into map tasks, each of which copies a partition of the files specified in the source list.
- You can use Falcon to continuously sync Hive tables between clusters. It uses "distcp" under the cover for the data but keeps track of Partitions added etc. Including secure ones being in the same realm makes it easier. Some setup is needed in the configuration to add both clusters to the environment of the one running the "distcp".