- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to migrate hive partitioned db to new cluster
- Labels:
-
Apache Hive
Created ‎11-23-2018 02:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi we have dev cluster with 5 nodes and prod cluster with 5 nodes boath with hive installed, now i want to migrate partitioned hive tables from dev to prod cluster,
can someone help me how to properly migrate tables and metastore to prod cluster.
Thanks in advance.
Created ‎11-25-2018 10:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@raja reddy
You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.
Following are the high-level steps involved in a Hive migration
- Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html
- Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> )
https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html
- Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...
Suppose if clusters are Kerberized then you can refer below links for distcp.
Note: There's no need for export because you can directly copy the data from HDFS between both clusters.
Please accept the answer you found most useful
Created ‎11-25-2018 10:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@raja reddy
You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.
Following are the high-level steps involved in a Hive migration
- Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html
- Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> )
https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html
- Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...
Suppose if clusters are Kerberized then you can refer below links for distcp.
Note: There's no need for export because you can directly copy the data from HDFS between both clusters.
Please accept the answer you found most useful
