Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive vertex error unknown cause.

avatar
New Contributor

We have recentely upgraded our CDP environments from runtime 7.2.12 to 7.2.16 and we are getting a wierd vertex error that we are unable to find the root cause for.

 

anyone have a suggestion where to start looking?

 

ERROR:

Vertex vertex_1677501096465_0141_1_61 [Reducer 21] killed/failed due to:OTHER_VERTEX_FAILURE, counters=Counters: 0, vertexStats=firstTaskStartTime=1677588363293, firstTasksToStart=[ task_1677501096465_0141_1_61_000000 ], lastTaskFinishTime=1677591606664, lastTasksToFinish=[ task_1677501096465_0141_1_61_000000 ], minTaskDuration=-1, maxTaskDuration=-1, avgTaskDuration=-1.0, numSuccessfulTasks=0, shortestDurationTasks=[ ], longestDurationTasks=[ ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=2, numCompletedTasks=1, numSucceededTasks=0, numKilledTasks=1, numFailedTasks=0}, servicePluginInfo=ServicePluginInfo {containerLauncherName=TezYarn, taskSchedulerName=TezYarn, taskCommunicatorName=TezYarn, containerLauncherClassName=org.apache.tez.dag.app.launcher.TezContainerLauncherImpl, taskSchedulerClassName=org.apache.tez.dag.app.rm.YarnTaskSchedulerService, taskCommunicatorClassName=org.apache.tez.dag.app.TezTaskCommunicatorImpl } 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1677501096465_0141_1_61 [Reducer 21] transitioned from TERMINATING to KILLED due to event V_TASK_COMPLETED 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Vertex vertex_1677501096465_0141_1_61 [Reducer 21] completed., numCompletedVertices=79, numSuccessfulVertices=65, numFailedVertices=1, numKilledVertices=13, numVertices=79 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Checking vertices for DAG completion, numCompletedVertices=79, numSuccessfulVertices=65, numFailedVertices=1, numKilledVertices=13, numVertices=79, commitInProgress=0, terminationCause=VERTEX_FAILURE 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:13 2023-02-28 13:40:06,807 [INFO] [Dispatcher thread {Central}] |HistoryEventHandler.criticalEvents|: [HISTORY][DAG:dag_1677501096465_0141_1][Event:DAG_FINISHED]: dagId=dag_1677501096465_0141_1, startTime=1677588043718, finishTime=1677591606664, timeTaken=3562946, status=FAILED, diagnostics=Vertex failed, vertexName=Reducer 36, vertexId=vertex_1677501096465_0141_1_70, diagnostics=[Task failed, taskId=task_1677501096465_0141_1_70_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeFirstValueRecord(BytesBytesMultiHashMap.java:896) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:440) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:450) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:242) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:385) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:454) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:241) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeFirstValueRecord(BytesBytesMultiHashMap.java:896) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:440) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:450) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:242) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:385) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:454) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:241) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 

1 ACCEPTED SOLUTION

avatar
Guru

@APG_JWinthaegen  I see you are getting OOM

 

:java.lang.OutOfMemoryError: Java heap space 

 

Please try increasing container size and check

 

set hive.tez.container.size=10240;

set tez.runtime.io.sort.mb=4096; (40% of hive.tez.container.size)

 

Please keep on increasing ,if it still fails. You can tweak the parameters in session level itself.

View solution in original post

1 REPLY 1

avatar
Guru

@APG_JWinthaegen  I see you are getting OOM

 

:java.lang.OutOfMemoryError: Java heap space 

 

Please try increasing container size and check

 

set hive.tez.container.size=10240;

set tez.runtime.io.sort.mb=4096; (40% of hive.tez.container.size)

 

Please keep on increasing ,if it still fails. You can tweak the parameters in session level itself.