Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala fails to read partitions (aws s3)

avatar
Explorer

Hello.

 

I am trying to read partitioned parquet table from s3 using impala. Table creating is successful:

 

create external table database.table 
  like parquet 's3a://bucket/table/dt=2020-11-06/file'
  partitioned by (dt string)
  stored as parquet
  location "s3a://bucket/table";

 

Select to this table returns empty, but in fact bucket is not empty. Show partitions also returns empty result

I am trying to make it read partitions, but both queries get stuck  and fail after `Read Timeout` from Metastore

 

ALTER TABLE database.table RECOVER PARTITIONS;

REFRESH database.table PARTITION (dt="2020-11-06");

 

Metastore logs (for ALTER TABLE):

 

2020-11-07T10:14:01,782  INFO [pool-5-thread-8] metastore.HiveMetaStore: 8: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2020-11-07T10:14:01,783  INFO [pool-5-thread-8] metastore.ObjectStore: ObjectStore, initialize called
2020-11-07T10:14:01,927  INFO [pool-5-thread-8] metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
2020-11-07T10:14:01,927  INFO [pool-5-thread-8] metastore.ObjectStore: Initialized ObjectStore
2020-11-07T10:14:01,927  INFO [pool-5-thread-8] metastore.HiveMetaStore: 8: add_partitions
2020-11-07T10:14:01,927  INFO [pool-5-thread-8] HiveMetaStore.audit: ugi=root   ip=127.0.0.1    cmd=add_partitions

 

 Metastore logs (for REFRESH):

 

2020-11-07T09:44:52,939  INFO [pool-5-thread-7] metastore.HiveMetaStore: 7: source:127.0.0.1 get_partition_by_name: tbl=hive.database.table part=dt=2020-11-06
2020-11-07T09:44:52,940  INFO [pool-5-thread-7] HiveMetaStore.audit: ugi=root   ip=127.0.0.1    cmd=source:127.0.0.1 get_partition_by_name: tbl=hive.database.table part=dt=2020-11-06
2020-11-07T09:44:52,940  INFO [pool-5-thread-7] metastore.HiveMetaStore: 7: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2020-11-07T09:44:52,941  INFO [pool-5-thread-7] metastore.ObjectStore: ObjectStore, initialize called
2020-11-07T09:44:53,014  INFO [pool-5-thread-7] metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
2020-11-07T09:44:53,014  INFO [pool-5-thread-7] metastore.ObjectStore: Initialized ObjectStore

 

Impalad logs:

I1107 10:18:00.572021  3732 Frontend.java:1245] 5f4f65bc6d08c533:5e808dfc00000000] Analyzing query: ALTER TABLE database.table RECOVER PARTITIONS db: database
I1107 10:18:00.572461  3732 Frontend.java:1286] 5f4f65bc6d08c533:5e808dfc00000000] Analysis finished.
I1107 10:19:01.868741  3451 client-request-state.cc:187] 034a1b4883de285d:cf0f264b00000000] ImpalaRuntimeException: Error making 'add_partition' RPC to Hive Metastore:
CAUSED BY: TTransportException: java.net.SocketTimeoutException: Read timed out
CAUSED BY: SocketTimeoutException: Read timed out

Can you please suggest, what could have gone wrong?

 

Thank you

3 REPLIES 3

avatar
Explorer

Additional information:

After some time I see the second attempt (auto retry?) in logs and then it fails with deadlock (same thing with derby and mysql):

2020-11-08T06:53:29,345 ERROR [pool-5-thread-9] metastore.RetryingHMSHandler: Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOException: Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.PART_NAME,A0.PART_ID FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = ? AND C0."NAME" = ? AND A0.PART_NAME = ? AND C0.CTLG_NAME = ?
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:677)
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
        at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267)
        at org.apache.hadoop.hive.metastore.ObjectStore.getMPartition(ObjectStore.java:2520)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2484)
        at org.apache.hadoop.hive.metastore.ObjectStore.doesPartitionExist(ObjectStore.java:9168)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
        at com.sun.proxy.$Proxy25.doesPartitionExist(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:3722)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:3398)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:3525)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at com.sun.proxy.$Proxy27.add_partitions_req(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partitions_req.getResult(ThriftHiveMetastore.java:15642)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partitions_req.getResult(ThriftHiveMetastore.java:15626)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
NestedThrowablesStackTrace:
java.sql.SQLTransactionRollbackException: A lock could not be obtained within the time requested
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(Unknown Source)
        at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
        at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
        at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:617)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855)
        at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368)
        at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267)
        at org.apache.hadoop.hive.metastore.ObjectStore.getMPartition(ObjectStore.java:2520)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2484)
        at org.apache.hadoop.hive.metastore.ObjectStore.doesPartitionExist(ObjectStore.java:9168)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
        at com.sun.proxy.$Proxy25.doesPartitionExist(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:3722)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:3398)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:3525)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at com.sun.proxy.$Proxy27.add_partitions_req(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partitions_req.getResult(ThriftHiveMetastore.java:15642)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partitions_req.getResult(ThriftHiveMetastore.java:15626)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ERROR 40XL1: A lock could not be obtained within the time requested
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown Source)
        at org.apache.derby.impl.services.locks.ConcurrentLockSet.zeroDurationLockObject(Unknown Source)
        at org.apache.derby.impl.services.locks.AbstractPool.zeroDurationlockObject(Unknown Source)
        at org.apache.derby.impl.services.locks.ConcurrentPool.zeroDurationlockObject(Unknown Source)
        at org.apache.derby.impl.store.raw.xact.RowLocking2nohold.lockRecordForRead(Unknown Source)
        at org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
        at org.apache.derby.impl.store.access.heap.HeapController.lockRow(Unknown Source)
        at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3.lockRowOnPage(Unknown Source)
        at org.apache.derby.impl.store.access.btree.index.B2IRowLocking3._lockScanRow(Unknown Source)
        at org.apache.derby.impl.store.access.btree.index.B2IRowLockingRR.lockScanRow(Unknown Source)
        at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(Unknown Source)
        at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(Unknown Source)
        at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.reloadArray(Unknown Source)
        at org.apache.derby.impl.sql.execute.BulkTableScanResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.IndexRowToBaseRowResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.JoinResultSet.openCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.JoinResultSet.openCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.ProjectRestrictResultSet.openCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.BasicNoPutResultSetImpl.open(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
        ... 43 more

 

avatar
Expert Contributor

@avlasenko  It seems Impala is having trouble communicating with HMS.

 

Could you please check if you are able to perform the same from Hive?

avatar
Explorer

Hi @tusharkathpal , thank you for the tip. I don't have Hive itself in my infra, just Impala + S3 + Kudu, but I will find time to install it and try. However please note that I am able to run other DDLs from Impala and it is able to communicate with Hive Metastore.