Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Migrate Hive Metastore from MySQL to Postgres

avatar
Rising Star

Hello,

I have a EMR cluster and the hive metastore is connected to MySQL RDS instance.

I am now moving to Hortonworks(v2.4.2 with Ambari 2.2) and with that I also wanted to move the hive metastore to Postgres RDS instance.

But whenever I migrate the data and try connecting the hive metastore to that schema, it throws the below error. It's working fine for the new database.

=================================

stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 245, in <module> HiveMetastore().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 530, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 58, in start self.configure(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_metastore.py", line 72, in configure hive(name = 'metastore') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 296, in hive user = params.hive_user File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'export HIVE_CONF_DIR=/usr/hdp/current/hive-metastore/conf/conf.server ; /usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType postgres -userName root -passWord [PROTECTED]' returned 1. WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://cirrus.c9xp5ox1vhs8.us-west-2.rds.amazonaws.com:5432/cirrus1 Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting metastore schema initialization to 1.2.1000 Initialization script hive-schema-1.2.1000.postgres.sql Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed *** stdout: 2016-07-13 08:55:20,106 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.4.2.0-258 2016-07-13 08:55:20,106 - Checking if need to create versioned conf dir /etc/hadoop/2.4.2.0-258/0

==========================================

1 ACCEPTED SOLUTION

avatar

@Rinku Singh

From below error, it seems like the Hive metastore is trying to run schemaTool -initSchema:

ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Please run schemaTool -upgradeSchema -dbType <type> -dryRun and let me know the result.

View solution in original post

15 REPLIES 15

avatar
@Rinku Singh

Did you try running schematool to upgrade Metastore?

For example: $ /usr/hdp/current/hive-metastore/bin/schematool -upgradeSchema -dbType mysql

Thanks and Regards,

Sindhu

avatar
Rising Star

Hi @Sindhu

But why should we run that command. I am not upgrading the metastore but migrating it from MySql to Postgres. Does this still needs to be run?

Best Regards,

Rinku Singh

avatar
Rising Star

Hi @Sindhu

Just to give more background. I created a new database in postgres which in turn creates a new schema called 'public' with no tables in it.

As soon as the connection is made by hive for the first time, it internally runs scripts and creates around 56 tables in the public schema.

This applies to Mysql db as well.

The issue comes when I have already migrated those 56 tables present in MySQL database to postgres database and then try to connect, I get the above error.

Best Regards,

Rinku Singh.

avatar

@Rinku Singh

From below error, it seems like the Hive metastore is trying to run schemaTool -initSchema:

ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

Please run schemaTool -upgradeSchema -dbType <type> -dryRun and let me know the result.

avatar
@Rinku Singh

When you say 56 tables present in MySQL database to postgres database, to which database the tables were migrated to on postgres?

What is the jdbc connection URL under hive-site.xml?

avatar
Rising Star

Hi @Sindhu

Let me put this in much clear way. There are two databases.

1. I created a database called Test. I then started the Hive service and it came up properly. I then saw that 56 tables were created which were not present earlier. It's obvious that it would have run the schemaTool -initSchema command.

2. Later I created one more database and I copied the 56 tables from MySQL to Postgres(in the new database). Changed the database in Ambari to point to this new database and tried restarting the Hive service. Then I got this error.

True that the Hive metastore is trying to run schemaTool -initSchema command in second scenario as well and I want to understand what triggers it to run the schemaTool -initSchema command and how can I resolve it so that I can successfully migrate the metastore and start the Hive service.

Best Regards,

Rinku Singh.

avatar

@Rinku Singh

Run the below command and share the result:

schematool -info -dbType postgres -userName hive -passWord hive -verbose

avatar
Rising Star

Hi @Sindhu

I ran the above command and found that the database schema version is not compatible with hive version

==========================================

[rsingh01]$ ./schematool -info -dbType postgres -userName root -passWord password -verbose WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Hive distribution version: 1.2.1000 Metastore schema version: 1.2.0 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 1.2.1000, Database Schema Version: 1.2.0 org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 1.2.1000, Database Schema Version: 1.2.0 at org.apache.hive.beeline.HiveSchemaTool.assertCompatibleVersion(HiveSchemaTool.java:196) at org.apache.hive.beeline.HiveSchemaTool.showInfo(HiveSchemaTool.java:140) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:501) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) *** schemaTool failed ***

============================================

I then tried upgrading the database schema version by running the below command. The dryRun is going through successfully but now I am stuck when I am doing the actual execution of the script

=============================================

[rsingh01]$ ./schematool -dbType postgres -userName root -passWord password -upgradeSchema -dryRun WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting upgrade metastore schema from version 1.2.0 to 1.2.1000 Upgrade script upgrade-1.2.0-to-1.2.1000.postgres.sql schemaTool completed

[rsingh01]$ ./schematool -dbType postgres -userName root -passWord password -upgradeSchema WARNING: Use "yarn jar" to launch YARN applications. Metastore connection URL: jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxx/xxxxx Metastore Connection Driver : org.postgresql.Driver Metastore connection User: root Starting upgrade metastore schema from version 1.2.0 to 1.2.1000 Upgrade script upgrade-1.2.0-to-1.2.1000.postgres.sql Error: ERROR: relation "compaction_queue" does not exist (state=42P01,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !! *** schemaTool failed ***

avatar
Rising Star

Hi @Sindhu

I found the reason why it says --- relation "compaction_queue" does not exist

In postgres, the command will not work:

ALTER TABLE COMPACTION_QUEUE ADD COLUMN CQ_HIGHEST_TXN_ID bigint;

But the below command will work:

ALTER TABLE "COMPACTION_QUEUE" ADD COLUMN "CQ_HIGHEST_TXN_ID" bigint;

The table name needs to be in double quotes. As all the upgrade scripts are written normally without double quotes, it will not work.

Trying to figure out a way how we can upgrade the database schema version with all these restrictions in postgres.

Best Regards,

Rinku Singh.