Support Questions

Find answers, ask questions, and share your expertise

Hive compactor error

Expert Contributor

Hi all,

I have an issue with compaction of Hive ACID table.

Env HDP 3.1.5.0-152 with Hive 3.1.0

All compaction jobs fail with this stack trace:

 

2022-06-14 10:46:02,236 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1653525342115_29428_m_157230162771970 asked for a task
2022-06-14 10:46:02,236 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1653525342115_29428_m_157230162771970 given task: attempt_1653525342115_29428_m_000000_0
2022-06-14 10:46:03,989 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1653525342115_29428_m_000000_0 is : 0.0
2022-06-14 10:46:03,994 ERROR [IPC Server handler 5 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1653525342115_29428_m_000000_0 - exited : java.lang.NullPointerException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.hadoop.io.Text.set(Text.java:225)
	at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59)
	at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
	at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
	at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
	at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
	at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$1.close(OrcOutputFormat.java:316)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.close(CompactorMR.java:1002)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

 

Below in the log file I see this error:

 

 

2022-06-14 10:46:08,699 INFO [IPC Server handler 2 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1653525342115_29428_m_000000_1 is : 0.0
2022-06-14 10:46:08,702 ERROR [IPC Server handler 5 on 40882] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1653525342115_29428_m_000000_1 - exited : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /<hdfs>/<path>/<database_name>.db/<tablename>/_tmp_5b5a4f18-76ef-42c3-acb0-64b175679d54/base_0000005/bucket_00000 for DFSClient_attempt_1653525342115_29428_m_000000_1_-740576932_1 on 10.102.190.206 because this file lease is currently owned by DFSClient_attempt_1653525342115_29428_m_000000_0_-14754452_1 on 10.102.xxx.xxx
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2604)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:378)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2453)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2351)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:774)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:462)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1498)
	at org.apache.hadoop.ipc.Client.call(Client.java:1444)
	at org.apache.hadoop.ipc.Client.call(Client.java:1354)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy13.create(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:362)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy14.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:273)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1211)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1190)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:531)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:528)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:542)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:469)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
	at org.apache.orc.impl.PhysicalFsWriter.<init>(PhysicalFsWriter.java:95)
	at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:177)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:94)
	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:378)
	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRawRecordWriter(OrcOutputFormat.java:299)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.getWriter(CompactorMR.java:1029)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:966)
	at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:939)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

 

but if I try to list the file it not exists on hdfs (I obfuscated the path in the logs).

 

Any idea to fix this issue? It's critical for me.

1 REPLY 1

Cloudera Employee
java.lang.NullPointerException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.hadoop.io.Text.set(Text.java:225)
	at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59)
	at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
	at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
	at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
	at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
	at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557)

The above error will be thrown if there is a schema mismatch between table metadata and orc file like

create table test(str string); -- table metadata

and orcfile dump looks like 

Type: struct<str:int> ... 

Please correct schema and try again