Member since
11-14-2018
2
Posts
0
Kudos Received
0
Solutions
11-25-2018
10:25 AM
3 Kudos
@raja reddy
You can copy the HDFS files from your dev cluster to prod cluster, then you can re-create the hive table on the prod cluster and then perform a compute statistic for all the metadata like MSCK REPAIR TABLE command. For re-creating the hive tables, you can get the create statement of the table by doing the show create table <table_name> query in your dev cluster.
Following are the high-level steps involved in a Hive migration
Use distcp command to copy the data present in the Hive warehouse complete database directory (/user/hive/warehouse) in Dev cluster to Prod cluster.
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/administration/content/using_distcp.html
Once the files are moved to new prod cluster, take the DDL for dev cluster and create the hive tables in prod cluster. (i.e., show create table <table_name> ) https://community.hortonworks.com/articles/107762/how-to-extract-all-hive-tables-ddl.html
Run metastore check with repair table, which will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
Suppose if clusters are Kerberized then you can refer below links for distcp.
https://community.hortonworks.com/content/supportkb/151079/configure-distcp-between-two-clusters-with-kerbero.html
Note: There's no need for export because you can directly copy the data from HDFS between both clusters. Please accept the answer you found most useful
... View more