Created 07-12-2019 03:10 PM
I have some Hive inserts which are working absolutely fine on HDP 2.6.5 and Redhat 6.8. But the same inserts fail on HDP 2.6.5 and Redhat 7.6/7.5
To reproduce it, create a table (preferably ACID table), now after inserting 100+ rows (sometimes 129 rows, sometimes 130 rows), it starts failing with "could only be replicated to 0 nodes" error.
Surprisingly if i start the same inserts again, after some time, it again fails after same 100 odd inserts. I tried tuning Ambari properties for Hive/Tez/YARN using HDP guidelines but that did not help much. The data node replication error could be mis-leading and when i tried inserting into non-transaction tables, i got some other errors also. All those backtraces are pasted below:
1) Backtrace 1: could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
INFO : Dag name: INSERT INTO analyticsdb....15:17:50.417',46)(Stage-1)
ERROR : Failed to execute tez graph.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive/hive/_tez_session_dir/e891a1be-8544-4bb4-a9f6-4a83a18094e2/.tez/application_1562919643063_0016/tez-conf.pb could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
2) Hive Runtime Error while closing operators
INFO : Dag name: INSERT INTO analyticsdb....15:17:50.417',46)(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1562919643063_0011)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 RUNNING 1 0 0 1 4 0
--------------------------------------------------------------------------------
VERTICES: 00/01 [>>--------------------------] 0% ELAPSED TIME: 11.19 s
--------------------------------------------------------------------------------
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1562919643063_0011_200_00, diagnostics=[Task failed, taskId=task_1562919643063_0011_200_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:370)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hive/warehouse/analyticsdb.db/tbl_accessareareaders_cdc/.hive-staging_hive_2019-07-12_09-57-54_555_2284217577217095088-3/_task_tmp.-ext-10000/_tmp.000000_0 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3372)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
3) Failed with exception Buffer underflow.
INFO : Dag name: INSERT INTO analyticsdb....15:17:50.417',46)(Stage-1)
INFO : Loading data to table analyticsdb.tbl_accessareareaders_cdc from hdfs://ip-172-31-17-232.ec2.internal:8020/apps/hive/warehouse/analyticsdb.db/tbl_accessareareaders_cdc/.hive-staging_hive_2019-07-12_09-27-38_852_1168222145805851570-3/-ext-10000
INFO : [Warning] could not update stats.Failed with exception Buffer underflow.
org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer underflow.
at org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:181)
at org.apache.hive.com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:809)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:670)
at org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregator.connect(FSStatsAggregator.java:66)
at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:318)
at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:149)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:122)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:162)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1765)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1506)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1303)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1165)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Created 07-16-2019 05:40 AM
I see lot of "
failed to rename /tmp/test/stage_hive_2019-07-15_11-24-45_060_5725383949242199490-280/-ext-10000/000000_0 to /apps/hive/warehouse/analyticsdb.db/test_p/000000_0_copy_135 because destination exists
" errors in name node log.
It looks like https://issues.apache.org/jira/browse/HIVE-13850 , but i am already using ACID tables so not sure why same file name is getting generated in loops.
Created 07-16-2019 08:40 AM
There is a workaround i found to this problem, if i have 500 rows, i have to execute them in 5 beeline/jdbc/hive cli sessions, so first session inserts 120 rows then another new session/new jdbc connection takes over and so on. I am not sure why this is happening at first place, where existing hive session is not able to see the hdfs file/directory names generated by inserts in that session and generates the exactly same file as for previous insert.