Support Questions

Find answers, ask questions, and share your expertise

Migrate Hive Metastore from MySQL to Postgres

avatar
Rising Star

Hello,

I have a EMR cluster and the hive metastore is connected to MySQL RDS instance.

I am now moving to Hortonworks(v2.4.2 with Ambari 2.2) and with that I also wanted to move the hive metastore to Postgres RDS instance.

But whenever I migrate the data and try connecting the hive metastore to that schema, it throws the below error. It's working fine for the new database.

=================================

stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 245, in <module> HiveMetastore().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 58, in start self.configure(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 72, in configure hive(name = 'metastore') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 296, in hive user = params.hive_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType postgres -userName root -passWord [PROTECTED]' returned 1. WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://cirrus.c9xp5ox1vhs8.us-west-2.rds.amazonaws.com:5432/cirrus1 Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting metastore schema initialization to 1.2.1000 Initialization script hive-schema-1.2.1000.postgres.sql Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed *** stdout: 2016-07-13 08:55:20,106 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.2.0-258 2016-07-13 08:55:20,106 - Checking if need to create versioned conf dir /etc/hadoop/2.4.2.0-258/0

==========================================

1 ACCEPTED SOLUTION

avatar

@Rinku Singh

From below error, it seems like the Hive metastore is trying to run schemaTool -initSchema:

ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Please run schemaTool -upgradeSchema -dbType <type> -dryRun and let me know the result.

View solution in original post

15 REPLIES 15

avatar
Rising Star

Hello @Sindhu

I did the following.

a. Created a new MySQL instance and migrated the tables into this new database

b. Pointed the hive metastore to the new MySQL instance. As expected, the hive metastore service didn't come up.

c. I ran the upgrade script command to upgrade the MySQL database schema.

d. I restarted the hive metastore service and it came up.

e. Now as the schema versions are same, I migrated the tables from the new MySQL database to Postgres database.

f. I am now trying to start the hiveserver2 but it is not starting but I am able to login into hive command prompt.

g. I also ran the following command

[rsingh01@ip-172-31-31-251 bin]$ ./schematool -info -dbType postgres -userName root -passWord password -verbose WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxxx/cirrus3 Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Hive distribution version: 1.2.1000 Metastore schema version: 1.2.1000 schemaTool completed

h. I tried querying the tables but now it is showing the following error:

hive> select * from rms_customer; FAILED: SemanticException Unable to determine if hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020/user/hive/warehouse/rms.db/rms_customer is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020/user/hive/warehouse/rms.db/rms_customer, expected: hdfs://xxxxxxxxxxxxxxxxxxxxxxxxxxxx:8020

h. I followed the link https://issues.apache.org/jira/browse/HIVE-11116 and tried to list the FSRoot by running the command metatool -listFSRoot but this time it is throwing the error:

===========================

[rsingh01@ip-172-31-31-251 bin]$ ./metatool -listFSRoot WARNING: Use "yarn jar" to launch YARN applications. Initializing HiveMetaTool.. 16/07/14 10:05:43 INFO metastore.ObjectStore: ObjectStore, initialize called 16/07/14 10:05:43 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 16/07/14 10:05:43 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 16/07/14 10:05:44 ERROR Datastore.Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxx/cirrus1, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ org.postgresql.util.PSQLException: FATAL: password authentication failed for user "root"

===========================

I am totally stuck now

Best Regards,

Rinku Singh.

avatar
@Rinku Singh

Error:

ERROR Datastore.Schema: Failed initialising database. Unable to open a test connection to the given database. JDBC url = jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxx/cirrus1, username = root.

Seems like issue with test connectivity. Can you please try testing connectivity using a java program?

https://community.hortonworks.com/content/kbentry/25410/simple-steps-to-test-hive-jdbc-connect.html

Thanks and Regards,

Sindhu

avatar
Rising Star

Hi @Sindhu

I finally had to add the parameter javax.jdo.option.ConnectionPassword in the hive-site.xml and re-run it. Fixed all the issues and everything is working now.

Thank you so much for your help and pointers.

Best Regards,

Rinku Singh.

avatar

@Rinku Singh

Glad to be of some help. Please mark best answer to end the discussion.

avatar
Explorer

Hi @Rinku Singh, @Sindhu,

We're migrating our cluster to new hardware. TBs of HDFS (/Data, /user & /apps/hive/warehouse) data transferred well to destination. Now, we need to move the Hive tables Metastore. There are 10s of DBs and each would have 100s of tables. The specifications are as follows:

Ambari: 2.5.0.3 (Old), 2.6.1.5 (New)

HDP: 2.6.0.3 (Old), 2.6.4.0 (New)

HIVE DB: MySQL (Old), Postgres (New)

Issue: We don't have the DBs schema so that we can run again on new cluster. It would be very tedious to Create Table from one to another. Is there a way to import the metadata dump from one to another?

Please advise!

Thanks allot in advance!

Regards.

avatar
Explorer

Hi @Rinku Singh, @Sindhu,

We're migrating our cluster to new hardware. TBs of HDFS (/Data, /user & /apps/hive/warehouse) data transferred well to destination. Now, we need to move the Hive tables Metastore. There are 10s of DBs and each would have 100s of tables. The specifications are as follows:

Ambari: 2.5.0.3 (Old), 2.6.1.5 (New)

HDP: 2.6.0.3 (Old), 2.6.4.0 (New)

HIVE DB: MySQL (Old), Postgres (New)

Issue: We don't have the DBs schema so that we can run again on new cluster. It would be very tedious to Create Table from one to another. Is there a way to import the metadata dump from one to another?

Please advise!

Thanks allot in advance!

Regards.