Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

create_database race condition in Hive metastore

create_database race condition in Hive metastore

New Contributor

Hi,

we are using HDP 3.1 with hive metastore running on postgres. we are trying to upgrade our setup from HDP2.6 to 3.1. We have ETL on spark over hdfs backed with hivemetastore. We found a race condition problem which didn't happen in HDP3.1. When a database is not present and we run a spark sql "CREATE DATABASE IF NOT EXISTS {{databaseName}}" from two different spark applications at the same time, sometimes we enter in to a race condition and one of the threads is going into error state. The following is the log in hive metastore of one thread.

 

 

2019-12-19T10:26:31,964 INFO  [pool-6-thread-198]: metastore.ObjectStore$RetryingExecutor (ObjectStore.java:run(9966)) - Attempting to acquire the DB log notification lock: 0 out of 10 retries
javax.jdo.JDODataStoreException: Error executing SQL query "select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update".
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) ~[datanucleus-api-jdo-4.2.4.jar:?]
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
        at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) ~[datanucleus-api-jdo-4.2.4.jar:?]
        at org.apache.hadoop.hive.metastore.ObjectStore.lambda$lockForUpdate$0(ObjectStore.java:9936) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.ObjectStore$RetryingExecutor.run(ObjectStore.java:9963) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.ObjectStore.lockForUpdate(ObjectStore.java:9938) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:10002) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at com.sun.proxy.$Proxy29.addNotificationEvent(Unknown Source) [?:?]
        at org.apache.hive.hcatalog.listener.DbNotificationListener.process(DbNotificationListener.java:968) [hive-hcatalog-server-extensions-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateDatabase(DbNotificationListener.java:380) [hive-hcatalog-server-extensions-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$22.notify(MetaStoreListenerNotifier.java:81) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:273) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1343) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1425) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at com.sun.proxy.$Proxy31.create_database(Unknown Source) [?:?]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:14861) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:14845) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_191]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_191]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) [hadoop-common-3.1.1.3.1.0.0-78.jar:?]
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "DBS" ("DB_ID","CTLG_NAME","DESC","DB_LOCATION_URI","NAME","OWNER_NAME","OWNER_TYPE") VALUES (578,'spark','','hdfs://ip-10-99-135-238.ec2.internal:8020/apps/spark/warehouse/user_management__hive_intermediate_15759__full.db','user_management__hive_intermediate_15759__full',NULL,'USER') was aborted: ERROR: duplicate key value violates unique constraint "UNIQUE_DATABASE"
  Detail: Key ("NAME", "CTLG_NAME")=(user_management__hive_intermediate_15759__full, spark) already exists.  Call getNextException to see other errors in the batch.
        at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2184) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:481) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538) ~[postgresql-jdbc3.jar:42.2.5]
        at com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125) ~[HikariCP-2.6.1.jar:?]
        at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?]
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:676) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:319) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getPreparedStatementForQuery(RDBMSQueryUtils.java:211) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:633) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]

 

 

  This tries 10 times (hive.notification.sequence.lock.max.retries)  with a gap of 30 seconds and after that it fails permanently with the following error.

 

 

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) [hadoop-common-3.1.1.3.1.0.0-78.jar:?]
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:106) ~[postgresql-jdbc3.jar:42.2.5]
        at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) ~[HikariCP-2.6.1.jar:?]
        at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?]
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
        at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.query.Query.execute(Query.java:1726) ~[datanucleus-core-4.1.17.jar:?]
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374) ~[datanucleus-api-jdo-4.2.4.jar:?]
        ... 35 more
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "UNIQUE_DATABASE"
  Detail: Key ("NAME", "CTLG_NAME")=(user_management__hive_intermediate_15759__full, spark) already exists.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:481) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840) ~[postgresql-jdbc3.jar:42.2.5]
        at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538) ~[postgresql-jdbc3.jar:42.2.5]
        at com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125) ~[HikariCP-2.6.1.jar:?]
        at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?]
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:366) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:676) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:319) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getPreparedStatementForQuery(RDBMSQueryUtils.java:211) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:633) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
        at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
        at org.datanucleus.store.query.Query.execute(Query.java:1726) ~[datanucleus-core-4.1.17.jar:?]
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374) ~[datanucleus-api-jdo-4.2.4.jar:?]
        ... 35 more

 

 

 It tries to delete the HDFS database directory "/apps/spark/warehouse/{{databaseName}}.db" directory. But by this time there are some tables which were created and that data gets deleted. Please help me out here 

 

Appreciate your help

Don't have an account?
Coming from Hortonworks? Activate your account here