Created 06-14-2022 01:58 AM
Hi all,
I have an issue with compaction of Hive ACID table.
Env HDP 3.1.5.0-152 with Hive 3.1.0
All compaction jobs fail with this stack trace:
2022-06-14 10:46:02,236 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1653525342115_29428_m_157230162771970 asked for a task
2022-06-14 10:46:02,236 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1653525342115_29428_m_157230162771970 given task: attempt_1653525342115_29428_m_000000_0
2022-06-14 10:46:03,989 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1653525342115_29428_m_000000_0 is : 0.0
2022-06-14 10:46:03,994 ERROR [IPC Server handler 5 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1653525342115_29428_m_000000_0 - exited : java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59)
at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$1.close(OrcOutputFormat.java:316)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.close(CompactorMR.java:1002)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Below in the log file I see this error:
2022-06-14 10:46:08,699 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1653525342115_29428_m_000000_1 is : 0.0
2022-06-14 10:46:08,702 ERROR [IPC Server handler 5 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1653525342115_29428_m_000000_1 - exited : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /<hdfs>/<path>/<database_name>.db/<tablename>/_tmp_5b5a4f18-76ef-42c3-acb0-64b175679d54/base_0000005/bucket_00000 for DFSClient_attempt_1653525342115_29428_m_000000_1_-740576932_1 on 10.102.190.206 because this file lease is currently owned by DFSClient_attempt_1653525342115_29428_m_000000_0_-14754452_1 on 10.102.xxx.xxx
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2604)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
at org.apache.hadoop.ipc.Client.call(Client.java:1354)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy13.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy14.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:273)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1211)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1190)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:531)
at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:528)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:542)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:469)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.orc.impl.PhysicalFsWriter.<init>(PhysicalFsWriter.java:95)
at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:177)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:94)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:378)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRawRecordWriter(OrcOutputFormat.java:299)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.getWriter(CompactorMR.java:1029)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:966)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:939)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
but if I try to list the file it not exists on hdfs (I obfuscated the path in the logs).
Any idea to fix this issue? It's critical for me.
Created 01-23-2023 08:41 AM
java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59) at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70) at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64) at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78) at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56) at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557)
The above error will be thrown if there is a schema mismatch between table metadata and orc file like
create table test(str string); -- table metadata
and orcfile dump looks like
Type: struct<str:int> ...
Please correct schema and try again