Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to migrate Hive Table From one cluster to another cluster running on different REALM?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Mentor

@harish

Prerequisite: MUST DO

If you want to copy a hive table across to another REALM you need to setup cross-realm trust between two MIT KDC's.This will enable the destination REALM user to have a valid Kerberos ticket to run operations on the source Cluster.

Having said that, you should forget to revise your ranger policies to reflect the new REALM access privilege if the Ranger plugin has been enabled in the source cluster which I assume is the case to leverage the Ranger authorization. Here is a link to an HCC document that could help you set up the REALM trust


PROCEDURE

Follow the steps below to migrate a Hive database from one cluster to another:

1. Install Hive on the new cluster and make sure both the source and destination clusters are identical.

2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster.

hadoop distcp <src> <dst> 

3. Take a backup of the Hive Metastore.

mysqldump hive > /tmp/mydir/backup_hive.sql

4. Install MySQL on the new Hadoop cluster.


5. Open the Hive MySQL-Metastore dump file and replace the source NameNode hostname with the destination hostname.

 hdfs://ip-address-old-namenode:port  ---> hdfs://ip-address-new-namenode:port 

6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster.

 mysql hive < /tmp/mydir/backup_hive.sql

7. Configure Hive as normal and perform the Hive schema upgrade if needed

Impact

Hive metadata contains the information about the database objects, and the contents are stored in the Hadoop Distributed File System (HDFS). Metadata contains HDFS URI and other details.

Therefore, if you migrate Hive from one cluster to another cluster, you have to point the metadata to the HDFS of the new cluster. If you don't do this, it will point to the HDFS of the older cluster and the migration will fail.

In case of any failure, initialize the Hive Metastore of the destination cluster and resume the migration following the correct steps.

/bin/schematool -initSchema -dbType mysql 


On CDH

If you are on Cloudera then you can proceed using Backup and Disaster recovery procedure

HTH

View solution in original post

1 REPLY 1

avatar
Master Mentor

@harish

Prerequisite: MUST DO

If you want to copy a hive table across to another REALM you need to setup cross-realm trust between two MIT KDC's.This will enable the destination REALM user to have a valid Kerberos ticket to run operations on the source Cluster.

Having said that, you should forget to revise your ranger policies to reflect the new REALM access privilege if the Ranger plugin has been enabled in the source cluster which I assume is the case to leverage the Ranger authorization. Here is a link to an HCC document that could help you set up the REALM trust


PROCEDURE

Follow the steps below to migrate a Hive database from one cluster to another:

1. Install Hive on the new cluster and make sure both the source and destination clusters are identical.

2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster.

hadoop distcp <src> <dst> 

3. Take a backup of the Hive Metastore.

mysqldump hive > /tmp/mydir/backup_hive.sql

4. Install MySQL on the new Hadoop cluster.


5. Open the Hive MySQL-Metastore dump file and replace the source NameNode hostname with the destination hostname.

 hdfs://ip-address-old-namenode:port  ---> hdfs://ip-address-new-namenode:port 

6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster.

 mysql hive < /tmp/mydir/backup_hive.sql

7. Configure Hive as normal and perform the Hive schema upgrade if needed

Impact

Hive metadata contains the information about the database objects, and the contents are stored in the Hadoop Distributed File System (HDFS). Metadata contains HDFS URI and other details.

Therefore, if you migrate Hive from one cluster to another cluster, you have to point the metadata to the HDFS of the new cluster. If you don't do this, it will point to the HDFS of the older cluster and the migration will fail.

In case of any failure, initialize the Hive Metastore of the destination cluster and resume the migration following the correct steps.

/bin/schematool -initSchema -dbType mysql 


On CDH

If you are on Cloudera then you can proceed using Backup and Disaster recovery procedure

HTH