Support Questions

yrajahadoop123 · ‎11-23-2018

hi we have dev cluster with 5 nodes and prod cluster with 5 nodes boath with hive installed, now i want to migrate partitioned hive tables from dev to prod cluster,

can someone help me how to properly migrate tables and metastore to prod cluster.

Thanks in advance.

jagadeesan · ‎11-25-2018

@raja reddy

You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.

Following are the high-level steps involved in a Hive migration

Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html

Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> )
https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html

Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...

Suppose if clusters are Kerberized then you can refer below links for distcp.

https://community.hortonworks.com/content/supportkb/151079/configure-distcp-between-two-clusters-wit...

Note: There's no need for export because you can directly copy the data from HDFS between both clusters.

Please accept the answer you found most useful

View solution in original post

jagadeesan · ‎11-25-2018

@raja reddy

You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.

Following are the high-level steps involved in a Hive migration

Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html

Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> )
https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html

Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...

Suppose if clusters are Kerberized then you can refer below links for distcp.

https://community.hortonworks.com/content/supportkb/151079/configure-distcp-between-two-clusters-wit...

Note: There's no need for export because you can directly copy the data from HDFS between both clusters.

Please accept the answer you found most useful

Cloudera Community

Support Questions

how to migrate hive partitioned db to new cluster

How to migrate Hive Metastore from Old cluster to ...

HIVE - Duplicate table and merge partitions from ...

Accessing Hive Metastore DB on CDP Public Cloud

Hive metastore DB Connection verification from Com...

Migrate Oozie DB from Derby to Mysql

Migrate Oozie DB from Derby to Postgresql

hive Insert to Dynamic Partition query Generating ...

Script to get all Hive Databases and underling tab...

how to migrate Hive data over to new cluster?

Solution: ALTER TABLE PARTITION SET LOCATION does ...