Support Questions

Find answers, ask questions, and share your expertise

Migrate Hive Metastore from MySQL to Postgres

Contributor

Hello,

I have a EMR cluster and the hive metastore is connected to MySQL RDS instance.

I am now moving to Hortonworks(v2.4.2 with Ambari 2.2) and with that I also wanted to move the hive metastore to Postgres RDS instance.

But whenever I migrate the data and try connecting the hive metastore to that schema, it throws the below error. It's working fine for the new database.

=================================

stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 245, in <module> HiveMetastore().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 58, in start self.configure(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 72, in configure hive(name = 'metastore') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 296, in hive user = params.hive_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType postgres -userName root -passWord [PROTECTED]' returned 1. WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://cirrus.c9xp5ox1vhs8.us-west-2.rds.amazonaws.com:5432/cirrus1 Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting metastore schema initialization to 1.2.1000 Initialization script hive-schema-1.2.1000.postgres.sql Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed *** stdout: 2016-07-13 08:55:20,106 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.2.0-258 2016-07-13 08:55:20,106 - Checking if need to create versioned conf dir /etc/hadoop/2.4.2.0-258/0

==========================================

1 ACCEPTED SOLUTION

@Rinku Singh

From below error, it seems like the Hive metastore is trying to run schemaTool -initSchema:

ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Please run schemaTool -upgradeSchema -dbType <type> -dryRun and let me know the result.

View solution in original post

15 REPLIES 15

@Rinku Singh

Did you try running schematool to upgrade Metastore?

For example: $ /usr/hdp/current/hive-metastore/bin/schematool -upgradeSchema -dbType mysql

Thanks and Regards,

Sindhu

Contributor

Hi @Sindhu

But why should we run that command. I am not upgrading the metastore but migrating it from MySql to Postgres. Does this still needs to be run?

Best Regards,

Rinku Singh

Contributor

Hi @Sindhu

Just to give more background. I created a new database in postgres which in turn creates a new schema called 'public' with no tables in it.

As soon as the connection is made by hive for the first time, it internally runs scripts and creates around 56 tables in the public schema.

This applies to Mysql db as well.

The issue comes when I have already migrated those 56 tables present in MySQL database to postgres database and then try to connect, I get the above error.

Best Regards,

Rinku Singh.

@Rinku Singh

From below error, it seems like the Hive metastore is trying to run schemaTool -initSchema:

ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Please run schemaTool -upgradeSchema -dbType <type> -dryRun and let me know the result.

@Rinku Singh

When you say 56 tables present in MySQL database to postgres database, to which database the tables were migrated to on postgres?

What is the jdbc connection URL under hive-site.xml?

Contributor

Hi @Sindhu

Let me put this in much clear way. There are two databases.

1. I created a database called Test. I then started the Hive service and it came up properly. I then saw that 56 tables were created which were not present earlier. It's obvious that it would have run the schemaTool -initSchema command.

2. Later I created one more database and I copied the 56 tables from MySQL to Postgres(in the new database). Changed the database in Ambari to point to this new database and tried restarting the Hive service. Then I got this error.

True that the Hive metastore is trying to run schemaTool -initSchema command in second scenario as well and I want to understand what triggers it to run the schemaTool -initSchema command and how can I resolve it so that I can successfully migrate the metastore and start the Hive service.

Best Regards,

Rinku Singh.

@Rinku Singh

Run the below command and share the result:

schematool -info -dbType postgres -userName hive -passWord hive -verbose

Contributor

Hi @Sindhu

I ran the above command and found that the database schema version is not compatible with hive version

==========================================

[rsingh01]$ ./schematool -info -dbType postgres -userName root -passWord password -verbose WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Hive distribution version: 1.2.1000 Metastore schema version: 1.2.0 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 1.2.1000, Database Schema Version: 1.2.0 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 1.2.1000, Database Schema Version: 1.2.0 at org.apache.hive.beeline.HiveSchemaTool.assertCompatibleVersion(HiveSchemaTool.java:196) at org.apache.hive.beeline.HiveSchemaTool.showInfo(HiveSchemaTool.java:140) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:501) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) *** schemaTool failed ***

============================================

I then tried upgrading the database schema version by running the below command. The dryRun is going through successfully but now I am stuck when I am doing the actual execution of the script

=============================================

[rsingh01]$ ./schematool -dbType postgres -userName root -passWord password -upgradeSchema -dryRun WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting upgrade metastore schema from version 1.2.0 to 1.2.1000 Upgrade script upgrade-1.2.0-to-1.2.1000.postgres.sql schemaTool completed

[rsingh01]$ ./schematool -dbType postgres -userName root -passWord password -upgradeSchema WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting upgrade metastore schema from version 1.2.0 to 1.2.1000 Upgrade script upgrade-1.2.0-to-1.2.1000.postgres.sql Error: ERROR: relation "compaction_queue" does not exist (state=42P01,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Contributor

Hi @Sindhu

I found the reason why it says --- relation "compaction_queue" does not exist

In postgres, the command will not work:

ALTER TABLE COMPACTION_QUEUE ADD COLUMN CQ_HIGHEST_TXN_ID bigint;

But the below command will work:

ALTER TABLE "COMPACTION_QUEUE" ADD COLUMN "CQ_HIGHEST_TXN_ID" bigint;

The table name needs to be in double quotes. As all the upgrade scripts are written normally without double quotes, it will not work.

Trying to figure out a way how we can upgrade the database schema version with all these restrictions in postgres.

Best Regards,

Rinku Singh.

Contributor

Hello @Sindhu

I did the following.

a. Created a new MySQL instance and migrated the tables into this new database

b. Pointed the hive metastore to the new MySQL instance. As expected, the hive metastore service didn't come up.

c. I ran the upgrade script command to upgrade the MySQL database schema.

d. I restarted the hive metastore service and it came up.

e. Now as the schema versions are same, I migrated the tables from the new MySQL database to Postgres database.

f. I am now trying to start the hiveserver2 but it is not starting but I am able to login into hive command prompt.

g. I also ran the following command

[rsingh01@ip-172-31-31-251 bin]$ ./schematool -info -dbType postgres -userName root -passWord password -verbose WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxxx/cirrus3 Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Hive distribution version: 1.2.1000 Metastore schema version: 1.2.1000 schemaTool completed

h. I tried querying the tables but now it is showing the following error:

hive> select * from rms_customer; FAILED: SemanticException Unable to determine if hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020/user/hive/warehouse/rms.db/rms_customer is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020/user/hive/warehouse/rms.db/rms_customer, expected: hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020

h. I followed the link https://issues.apache.org/jira/browse/HIVE-11116 and tried to list the FSRoot by running the command metatool -listFSRoot but this time it is throwing the error:

===========================

[rsingh01@ip-172-31-31-251 bin]$ ./metatool -listFSRoot WARNING: Use "yarn jar" to launch YARN applications. Initializing HiveMetaTool.. 16/07/14 10:05:43 INFO metastore.ObjectStore: ObjectStore, initialize called 16/07/14 10:05:43 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 16/07/14 10:05:43 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 16/07/14 10:05:44 ERROR Datastore.Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxx/cirrus1, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ org.postgresql.util.PSQLException: FATAL: password authentication failed for user "root"

===========================

I am totally stuck now

Best Regards,

Rinku Singh.

@Rinku Singh

Error:

ERROR Datastore.Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxx/cirrus1, username = root.

Seems like issue with test connectivity. Can you please try testing connectivity using a java program?

https://community.hortonworks.com/content/kbentry/25410/simple-steps-to-test-hive-jdbc-connect.html

Thanks and Regards,

Sindhu

Contributor

Hi @Sindhu

I finally had to add the parameter javax.jdo.option.ConnectionPassword in the hive-site.xml and re-run it. Fixed all the issues and everything is working now.

Thank you so much for your help and pointers.

Best Regards,

Rinku Singh.

@Rinku Singh

Glad to be of some help. Please mark best answer to end the discussion.

New Contributor

Hi @Rinku Singh, @Sindhu,

We're migrating our cluster to new hardware. TBs of HDFS (/Data, /user & /apps/hive/warehouse) data transferred well to destination. Now, we need to move the Hive tables Metastore. There are 10s of DBs and each would have 100s of tables. The specifications are as follows:

Ambari: 2.5.0.3 (Old), 2.6.1.5 (New)

HDP: 2.6.0.3 (Old), 2.6.4.0 (New)

HIVE DB: MySQL (Old), Postgres (New)

Issue: We don't have the DBs schema so that we can run again on new cluster. It would be very tedious to Create Table from one to another. Is there a way to import the metadata dump from one to another?

Please advise!

Thanks allot in advance!

Regards.

New Contributor

Hi @Rinku Singh, @Sindhu,

We're migrating our cluster to new hardware. TBs of HDFS (/Data, /user & /apps/hive/warehouse) data transferred well to destination. Now, we need to move the Hive tables Metastore. There are 10s of DBs and each would have 100s of tables. The specifications are as follows:

Ambari: 2.5.0.3 (Old), 2.6.1.5 (New)

HDP: 2.6.0.3 (Old), 2.6.4.0 (New)

HIVE DB: MySQL (Old), Postgres (New)

Issue: We don't have the DBs schema so that we can run again on new cluster. It would be very tedious to Create Table from one to another. Is there a way to import the metadata dump from one to another?

Please advise!

Thanks allot in advance!

Regards.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.