Created 02-28-2023 07:05 AM
We have recentely upgraded our CDP environments from runtime 7.2.12 to 7.2.16 and we are getting a wierd vertex error that we are unable to find the root cause for.
anyone have a suggestion where to start looking?
ERROR:
Vertex vertex_1677501096465_0141_1_61 [Reducer 21] killed/failed due to:OTHER_VERTEX_FAILURE, counters=Counters: 0, vertexStats=firstTaskStartTime=1677588363293, firstTasksToStart=[ task_1677501096465_0141_1_61_000000 ], lastTaskFinishTime=1677591606664, lastTasksToFinish=[ task_1677501096465_0141_1_61_000000 ], minTaskDuration=-1, maxTaskDuration=-1, avgTaskDuration=-1.0, numSuccessfulTasks=0, shortestDurationTasks=[ ], longestDurationTasks=[ ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=2, numCompletedTasks=1, numSucceededTasks=0, numKilledTasks=1, numFailedTasks=0}, servicePluginInfo=ServicePluginInfo {containerLauncherName=TezYarn, taskSchedulerName=TezYarn, taskCommunicatorName=TezYarn, containerLauncherClassName=org.apache.tez.dag.app.launcher.TezContainerLauncherImpl, taskSchedulerClassName=org.apache.tez.dag.app.rm.YarnTaskSchedulerService, taskCommunicatorClassName=org.apache.tez.dag.app.TezTaskCommunicatorImpl } 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1677501096465_0141_1_61 [Reducer 21] transitioned from TERMINATING to KILLED due to event V_TASK_COMPLETED 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Vertex vertex_1677501096465_0141_1_61 [Reducer 21] completed., numCompletedVertices=79, numSuccessfulVertices=65, numFailedVertices=1, numKilledVertices=13, numVertices=79 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: Checking vertices for DAG completion, numCompletedVertices=79, numSuccessfulVertices=65, numFailedVertices=1, numKilledVertices=13, numVertices=79, commitInProgress=0, terminationCause=VERTEX_FAILURE 2023-02-28 13:40:06,664 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:13 2023-02-28 13:40:06,807 [INFO] [Dispatcher thread {Central}] |HistoryEventHandler.criticalEvents|: [HISTORY][DAG:dag_1677501096465_0141_1][Event:DAG_FINISHED]: dagId=dag_1677501096465_0141_1, startTime=1677588043718, finishTime=1677591606664, timeTaken=3562946, status=FAILED, diagnostics=Vertex failed, vertexName=Reducer 36, vertexId=vertex_1677501096465_0141_1_70, diagnostics=[Task failed, taskId=task_1677501096465_0141_1_70_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeFirstValueRecord(BytesBytesMultiHashMap.java:896) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:440) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:450) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:242) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:385) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:454) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:241) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeValue(MapJoinBytesTableContainer.java:333) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.writeFirstValueRecord(BytesBytesMultiHashMap.java:896) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:440) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:450) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:242) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:385) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:454) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:241) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:193) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
Created 02-28-2023 09:32 PM
@APG_JWinthaegen I see you are getting OOM
:java.lang.OutOfMemoryError: Java heap space
Please try increasing container size and check
set hive.tez.container.size=10240;
set tez.runtime.io.sort.mb=4096; (40% of hive.tez.container.size)
Please keep on increasing ,if it still fails. You can tweak the parameters in session level itself.
Created 02-28-2023 09:32 PM
@APG_JWinthaegen I see you are getting OOM
:java.lang.OutOfMemoryError: Java heap space
Please try increasing container size and check
set hive.tez.container.size=10240;
set tez.runtime.io.sort.mb=4096; (40% of hive.tez.container.size)
Please keep on increasing ,if it still fails. You can tweak the parameters in session level itself.