Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to migrate Hive data over to new cluster?

avatar
Super Collaborator

We have a Old Cluster with HDP 2.0.6 (Hive 0.12.0).. New Cluster with HDP 2.3.2 (Hive 1.2.1). We need to hive query the same tables in new environment and old environment after data forklift in same manner (implying database names, partitioning, etc. work). We migrated all the hdfs data over using distcp.

1 ACCEPTED SOLUTION

avatar

If its just a migration, you can make a copy of your old database and then point the HDP 2.3.2 cluster to the copy of your old database. Before starting the Hive services upgrade the hive database by using schemaTool. You can use metatool to update the HDFS locations to the new cluster. Start the hive services.

View solution in original post

10 REPLIES 10

avatar

If its just a migration, you can make a copy of your old database and then point the HDP 2.3.2 cluster to the copy of your old database. Before starting the Hive services upgrade the hive database by using schemaTool. You can use metatool to update the HDFS locations to the new cluster. Start the hive services.

avatar
Super Collaborator

Hi @Deepesh yes finally I was able to migrate using similar steps of db backup, restore & point. I will update the steps soon. Only applicable in clean env. Doesn't seem to be a clean approach in case there is already pre-existing database & tables in hive . We should have a way to extract ddl scripts for all tables / database either from MySQL or hive & run it on new env in hive.

avatar

@Saumil MayaniHDP2.3.2 has Hive 1.2.1? Did you mean HDP 2.2.x or HDP2.3.x?

avatar
Super Collaborator

yes @Pradeep, you are right .. It is hdp 2.3.2 & hive 1.2.1

avatar
Master Mentor

@Saumil Mayani

If you want to get ddl of tables

mysql -u hive -p -e " select concat(  'show create table ' , TBL_NAME,';') from TBLS" hive > file.sql  
remove header in file.sql
hive -f /tmp/file.sql

Details

[root@phdns02 ~]#

Table names

login to mysql

use hive;

select concat( 'show create table ' , TBL_NAME,';') from TBLS;

or

mysql -u hive -p -e " select concat( 'show create table ' , TBL_NAME,';') from TBLS" hive > file.sql

548-screen-shot-2015-11-20-at-101152-pm.png

You can add above output in shell and run this

[root@phdns02 ~]# su - hive

[hive@phdns02 ~]$ hive -f /tmp/h.ddl

WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3276/0/hive-log4j.properties

OK

CREATE TABLE `hadoop`(

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://phdns01.cloud.hortonworks.com:8020/apps/hive/warehouse/hadoop'

TBLPROPERTIES (

'transient_lastDdlTime'='1448062564')

Time taken: 2.045 seconds, Fetched: 12 row(s)

OK

CREATE TABLE `ns`(

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://phdns01.cloud.hortonworks.com:8020/apps/hive/warehouse/ns'

TBLPROPERTIES (

'transient_lastDdlTime'='1448062211')

Time taken: 0.067 seconds, Fetched: 12 row(s)

[hive@phdns02 ~]$

avatar
Super Collaborator

Neeraj Sabharwal This is great for fetching all tables from Mysql, however, To avoid Error "FAILED: SemanticException [Error 10001]: Table not found Table Name"

The following would work.

mysql -u hive -p -e "select concat( 'show create table ' , T.NAME , '.', T.TBL_NAME,';') from (select DBS.NAME, TBLS.TBL_NAME from TBLS left join DBS on TBLS.DB_ID = DBS.DB_ID) T" hive > /tmp/file.ddl
##remove header in file.sql
hive -f /tmp/file.ddl > tmp/create_table.ddl

avatar
Master Mentor

@Saumil Mayani nice! Def deserve upvote.

avatar
Rising Star

Does this include View DDLs?

avatar
Contributor

In case of oracle metastore , the query would to run from SQL Developer or TOAD would be :

SELECT 'show create table' || dbs.name || '.' || tbls.tbl_name || ';' FROM tbls LEFT JOIN dbs ON tbls.db_id = dbs.db_id