Created 08-30-2021 09:21 PM
Getting out of memory error while running merge query.
From Cloudera manager Hive on tez Client Java Heap Size = 8GB, Java Heap Size of HiveServer2 = 16GB
Destination table is partitioned by date. Log is given following. Please anyone can give suggestions what configuration is needed to be changed here?
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_FAILED_TASKS: 1 INFO : NUM_SUCCEEDED_TASKS: 312 INFO : TOTAL_LAUNCHED_TASKS: 313 INFO : DATA_LOCAL_TASKS: 103 INFO : RACK_LOCAL_TASKS: 2 INFO : AM_CPU_MILLISECONDS: 20170 INFO : AM_GC_TIME_MILLIS: 14 INFO : File System Counters: INFO : FILE_BYTES_READ: 1692027391 INFO : FILE_BYTES_WRITTEN: 1446300353 INFO : HDFS_BYTES_READ: 1490822899 INFO : HDFS_BYTES_WRITTEN: 316667034 INFO : HDFS_READ_OPS: 46265 INFO : HDFS_WRITE_OPS: 813 INFO : HDFS_OP_CREATE: 610 INFO : HDFS_OP_GET_FILE_STATUS: 26668 INFO : HDFS_OP_MKDIRS: 202 INFO : HDFS_OP_OPEN: 19597 INFO : HDFS_OP_RENAME: 1 INFO : org.apache.tez.common.counters.TaskCounter: INFO : REDUCE_INPUT_GROUPS: 12682288 INFO : REDUCE_INPUT_RECORDS: 12682489 INFO : COMBINE_INPUT_RECORDS: 0 INFO : SPILLED_RECORDS: 25380919 INFO : NUM_SHUFFLED_INPUTS: 80831 INFO : NUM_SKIPPED_INPUTS: 26027 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : MERGED_MAP_OUTPUTS: 80726 INFO : GC_TIME_MILLIS: 23019 INFO : TASK_DURATION_MILLIS: 1358201 INFO : CPU_MILLISECONDS: 2267230 INFO : PHYSICAL_MEMORY_BYTES: 656928669696 INFO : VIRTUAL_MEMORY_BYTES: 1756473344000 INFO : COMMITTED_HEAP_BYTES: 656928669696 INFO : INPUT_RECORDS_PROCESSED: 165868 INFO : INPUT_SPLIT_LENGTH_BYTES: 10560419772 INFO : OUTPUT_RECORDS: 12698477 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_BYTES: 1812594164 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 1843377381 INFO : OUTPUT_BYTES_PHYSICAL: 830127603 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 613599766 INFO : ADDITIONAL_SPILLS_BYTES_READ: 724978605 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : SHUFFLE_CHUNK_COUNT: 711 INFO : SHUFFLE_BYTES: 1064140784 INFO : SHUFFLE_BYTES_DECOMPRESSED: 2073887365 INFO : SHUFFLE_BYTES_TO_MEM: 905909080 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_DISK_DIRECT: 158231704 INFO : NUM_MEM_TO_DISK_MERGES: 0 INFO : NUM_DISK_TO_DISK_MERGES: 0 INFO : SHUFFLE_PHASE_TIME: 812287 INFO : MERGE_PHASE_TIME: 825246 INFO : FIRST_EVENT_RECEIVED: 5873 INFO : LAST_EVENT_RECEIVED: 562836 INFO : DATA_BYTES_VIA_EVENT: 0 INFO : HIVE: INFO : CREATED_FILES: 203 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 162322988 INFO : RECORDS_IN_Map_6: 5062154 INFO : RECORDS_OUT_0: 1 INFO : RECORDS_OUT_1_cpstxn.cps_trans_record: 5046213 INFO : RECORDS_OUT_3_default.merge_tmp_table: 0 INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 10839678 INFO : RECORDS_OUT_INTERMEDIATE_Map_6: 6287111 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 32084 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_4: 0 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_5: 0 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_7: 1 INFO : RECORDS_OUT_OPERATOR_EVENT_53: 46 INFO : RECORDS_OUT_OPERATOR_FIL_33: 15941 INFO : RECORDS_OUT_OPERATOR_FIL_34: 0 INFO : RECORDS_OUT_OPERATOR_FIL_35: 15941 INFO : RECORDS_OUT_OPERATOR_FIL_36: 5046213 INFO : RECORDS_OUT_OPERATOR_FIL_57: 7604192 INFO : RECORDS_OUT_OPERATOR_FS_15: 1 INFO : RECORDS_OUT_OPERATOR_FS_31: 0 INFO : RECORDS_OUT_OPERATOR_FS_8: 5046213 INFO : RECORDS_OUT_OPERATOR_GBY_11: 202 INFO : RECORDS_OUT_OPERATOR_GBY_13: 1 INFO : RECORDS_OUT_OPERATOR_GBY_26: 15941 INFO : RECORDS_OUT_OPERATOR_GBY_28: 15941 INFO : RECORDS_OUT_OPERATOR_GBY_51: 192 INFO : RECORDS_OUT_OPERATOR_GBY_52: 46 INFO : RECORDS_OUT_OPERATOR_GBY_55: 1 INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 INFO : RECORDS_OUT_OPERATOR_MERGEJOIN_47: 5062154 INFO : RECORDS_OUT_OPERATOR_RS_12: 202 INFO : RECORDS_OUT_OPERATOR_RS_18: 15941 INFO : RECORDS_OUT_OPERATOR_RS_27: 15941 INFO : RECORDS_OUT_OPERATOR_RS_48: 6287054 INFO : RECORDS_OUT_OPERATOR_RS_54: 57 INFO : RECORDS_OUT_OPERATOR_RS_56: 1 INFO : RECORDS_OUT_OPERATOR_RS_58: 10839678 INFO : RECORDS_OUT_OPERATOR_SEL_10: 5046213 INFO : RECORDS_OUT_OPERATOR_SEL_14: 1 INFO : RECORDS_OUT_OPERATOR_SEL_17: 15941 INFO : RECORDS_OUT_OPERATOR_SEL_25: 15941 INFO : RECORDS_OUT_OPERATOR_SEL_30: 0 INFO : RECORDS_OUT_OPERATOR_SEL_49: 5062154 INFO : RECORDS_OUT_OPERATOR_SEL_50: 5062154 INFO : RECORDS_OUT_OPERATOR_SEL_6: 5046213 INFO : RECORDS_OUT_OPERATOR_TS_0: 162322988 INFO : RECORDS_OUT_OPERATOR_TS_1: 5062154 INFO : TOTAL_TABLE_ROWS_WRITTEN: 5046213 INFO : Shuffle Errors: INFO : BAD_ID: 0 INFO : CONNECTION: 0 INFO : IO_ERROR: 0 INFO : WRONG_LENGTH: 0 INFO : WRONG_MAP: 0 INFO : WRONG_REDUCE: 0 INFO : Shuffle Errors_Reducer_2_INPUT_Map_1: INFO : BAD_ID: 0 INFO : CONNECTION: 0 INFO : IO_ERROR: 0 INFO : WRONG_LENGTH: 0 INFO : WRONG_MAP: 0 INFO : WRONG_REDUCE: 0 INFO : Shuffle Errors_Reducer_2_INPUT_Map_6: INFO : BAD_ID: 0 INFO : CONNECTION: 0 INFO : IO_ERROR: 0 INFO : WRONG_LENGTH: 0 INFO : WRONG_MAP: 0 INFO : WRONG_REDUCE: 0 INFO : Shuffle Errors_Reducer_4_INPUT_Reducer_2: INFO : BAD_ID: 0 INFO : CONNECTION: 0 INFO : IO_ERROR: 0 INFO : WRONG_LENGTH: 0 INFO : WRONG_MAP: 0 INFO : WRONG_REDUCE: 0 INFO : Shuffle Errors_Reducer_5_INPUT_Reducer_2: INFO : BAD_ID: 0 INFO : CONNECTION: 0 INFO : IO_ERROR: 0 INFO : WRONG_LENGTH: 0 INFO : WRONG_MAP: 0 INFO : WRONG_REDUCE: 0 INFO : TaskCounter_Map_1_INPUT_Reducer_7: INFO : FIRST_EVENT_RECEIVED: 1058 INFO : INPUT_RECORDS_PROCESSED: 59 INFO : LAST_EVENT_RECEIVED: 1058 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_SHUFFLED_INPUTS: 59 INFO : SHUFFLE_BYTES: 239431145 INFO : SHUFFLE_BYTES_DECOMPRESSED: 239408548 INFO : SHUFFLE_BYTES_DISK_DIRECT: 40581550 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 198849595 INFO : SHUFFLE_PHASE_TIME: 3670 INFO : TaskCounter_Map_1_INPUT_tx: INFO : INPUT_RECORDS_PROCESSED: 160794 INFO : INPUT_SPLIT_LENGTH_BYTES: 10173459852 INFO : TaskCounter_Map_1_OUTPUT_Reducer_2: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : OUTPUT_BYTES: 380907998 INFO : OUTPUT_BYTES_PHYSICAL: 188683497 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 396318618 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 7604192 INFO : SHUFFLE_CHUNK_COUNT: 59 INFO : SPILLED_RECORDS: 7604192 INFO : TaskCounter_Map_6_INPUT_otx: INFO : INPUT_RECORDS_PROCESSED: 4969 INFO : INPUT_SPLIT_LENGTH_BYTES: 386959920 INFO : TaskCounter_Map_6_OUTPUT_Reducer_2: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : OUTPUT_BYTES: 1232523507 INFO : OUTPUT_BYTES_PHYSICAL: 586260727 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 1247795251 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 5062154 INFO : SHUFFLE_CHUNK_COUNT: 46 INFO : SPILLED_RECORDS: 5062154 INFO : TaskCounter_Map_6_OUTPUT_Reducer_7: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : DATA_BYTES_VIA_EVENT: 0 INFO : OUTPUT_BYTES: 186657006 INFO : OUTPUT_BYTES_PHYSICAL: 47668704 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 186657512 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 46 INFO : SPILLED_RECORDS: 0 INFO : TaskCounter_Reducer_2_INPUT_Map_1: INFO : ADDITIONAL_SPILLS_BYTES_READ: 180171233 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 154085705 INFO : COMBINE_INPUT_RECORDS: 0 INFO : FIRST_EVENT_RECEIVED: 3343 INFO : LAST_EVENT_RECEIVED: 551166 INFO : MERGED_MAP_OUTPUTS: 33706 INFO : MERGE_PHASE_TIME: 656745 INFO : NUM_DISK_TO_DISK_MERGES: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_MEM_TO_DISK_MERGES: 0 INFO : NUM_SHUFFLED_INPUTS: 33706 INFO : NUM_SKIPPED_INPUTS: 25825 INFO : REDUCE_INPUT_GROUPS: 7604192 INFO : REDUCE_INPUT_RECORDS: 7604192 INFO : SHUFFLE_BYTES: 188683497 INFO : SHUFFLE_BYTES_DECOMPRESSED: 396318618 INFO : SHUFFLE_BYTES_DISK_DIRECT: 26085528 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 162597969 INFO : SHUFFLE_PHASE_TIME: 649878 INFO : SPILLED_RECORDS: 7604192 INFO : TaskCounter_Reducer_2_INPUT_Map_6: INFO : ADDITIONAL_SPILLS_BYTES_READ: 542792597 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 457832598 INFO : COMBINE_INPUT_RECORDS: 0 INFO : FIRST_EVENT_RECEIVED: 1449 INFO : LAST_EVENT_RECEIVED: 4262 INFO : MERGED_MAP_OUTPUTS: 46414 INFO : MERGE_PHASE_TIME: 163079 INFO : NUM_DISK_TO_DISK_MERGES: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_MEM_TO_DISK_MERGES: 0 INFO : NUM_SHUFFLED_INPUTS: 46414 INFO : NUM_SKIPPED_INPUTS: 0 INFO : REDUCE_INPUT_GROUPS: 5062154 INFO : REDUCE_INPUT_RECORDS: 5062154 INFO : SHUFFLE_BYTES: 586260727 INFO : SHUFFLE_BYTES_DECOMPRESSED: 1247795251 INFO : SHUFFLE_BYTES_DISK_DIRECT: 84959999 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 501300728 INFO : SHUFFLE_PHASE_TIME: 152326 INFO : SPILLED_RECORDS: 5062154 INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_3: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : OUTPUT_BYTES: 4776782 INFO : OUTPUT_BYTES_PHYSICAL: 1358681 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 4840792 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 15941 INFO : SHUFFLE_CHUNK_COUNT: 202 INFO : SPILLED_RECORDS: 15941 INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_4: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : OUTPUT_BYTES: 510112 INFO : OUTPUT_BYTES_PHYSICAL: 150267 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 544418 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 15941 INFO : SHUFFLE_CHUNK_COUNT: 202 INFO : SPILLED_RECORDS: 15941 INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_5: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : OUTPUT_BYTES: 3160998 INFO : OUTPUT_BYTES_PHYSICAL: 1947548 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 3163018 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 202 INFO : SHUFFLE_CHUNK_COUNT: 202 INFO : SPILLED_RECORDS: 202 INFO : TaskCounter_Reducer_4_INPUT_Reducer_2: INFO : ADDITIONAL_SPILLS_BYTES_READ: 122588 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 97329 INFO : COMBINE_INPUT_RECORDS: 0 INFO : FIRST_EVENT_RECEIVED: 3 INFO : LAST_EVENT_RECEIVED: 2619 INFO : MERGED_MAP_OUTPUTS: 404 INFO : MERGE_PHASE_TIME: 2682 INFO : NUM_DISK_TO_DISK_MERGES: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_MEM_TO_DISK_MERGES: 0 INFO : NUM_SHUFFLED_INPUTS: 404 INFO : NUM_SKIPPED_INPUTS: 0 INFO : REDUCE_INPUT_GROUPS: 15941 INFO : REDUCE_INPUT_RECORDS: 15941 INFO : SHUFFLE_BYTES: 150267 INFO : SHUFFLE_BYTES_DECOMPRESSED: 544418 INFO : SHUFFLE_BYTES_DISK_DIRECT: 25259 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 125008 INFO : SHUFFLE_PHASE_TIME: 2664 INFO : SPILLED_RECORDS: 15941 INFO : TaskCounter_Reducer_4_OUTPUT_out_Reducer_4: INFO : OUTPUT_RECORDS: 0 INFO : TaskCounter_Reducer_5_INPUT_Reducer_2: INFO : ADDITIONAL_SPILLS_BYTES_READ: 1892187 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 1584134 INFO : COMBINE_INPUT_RECORDS: 0 INFO : FIRST_EVENT_RECEIVED: 2 INFO : LAST_EVENT_RECEIVED: 2723 INFO : MERGED_MAP_OUTPUTS: 202 INFO : MERGE_PHASE_TIME: 2740 INFO : NUM_DISK_TO_DISK_MERGES: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_MEM_TO_DISK_MERGES: 0 INFO : NUM_SHUFFLED_INPUTS: 202 INFO : NUM_SKIPPED_INPUTS: 202 INFO : REDUCE_INPUT_GROUPS: 1 INFO : REDUCE_INPUT_RECORDS: 202 INFO : SHUFFLE_BYTES: 1947548 INFO : SHUFFLE_BYTES_DECOMPRESSED: 3163018 INFO : SHUFFLE_BYTES_DISK_DIRECT: 308053 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 1639495 INFO : SHUFFLE_PHASE_TIME: 2728 INFO : SPILLED_RECORDS: 202 INFO : TaskCounter_Reducer_5_OUTPUT_out_Reducer_5: INFO : OUTPUT_RECORDS: 0 INFO : TaskCounter_Reducer_7_INPUT_Map_6: INFO : FIRST_EVENT_RECEIVED: 18 INFO : INPUT_RECORDS_PROCESSED: 46 INFO : LAST_EVENT_RECEIVED: 1008 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : NUM_SHUFFLED_INPUTS: 46 INFO : SHUFFLE_BYTES: 47667600 INFO : SHUFFLE_BYTES_DECOMPRESSED: 186657512 INFO : SHUFFLE_BYTES_DISK_DIRECT: 6271315 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_TO_MEM: 41396285 INFO : SHUFFLE_PHASE_TIME: 1021 INFO : TaskCounter_Reducer_7_OUTPUT_Map_1: INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : DATA_BYTES_VIA_EVENT: 0 INFO : OUTPUT_BYTES: 4057761 INFO : OUTPUT_BYTES_PHYSICAL: 4058179 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 4057772 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_RECORDS: 1 INFO : SPILLED_RECORDS: 0 INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters: INFO : GROUPED_INPUT_SPLITS_Map_1: 59 INFO : GROUPED_INPUT_SPLITS_Map_6: 46 INFO : INPUT_DIRECTORIES_Map_1: 9 INFO : INPUT_DIRECTORIES_Map_6: 1 INFO : INPUT_FILES_Map_1: 3722 INFO : INPUT_FILES_Map_6: 46 INFO : RAW_INPUT_SPLITS_Map_1: 3722 INFO : RAW_INPUT_SPLITS_Map_6: 46 ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1629975626227_3988_3_06, diagnostics=[Task failed, taskId=task_1629975626227_3988_3_06_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.<init>(BytesColumnVector.java:85) at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:657) at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:661) at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:699) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:101) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:389) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:456) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:498) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1100) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) , errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.<init>(BytesColumnVector.java:85) at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:657) at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:661) at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:699) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:101) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:389) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:456) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:498) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1100) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1629975626227_3988_3_06 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Created 09-01-2021 10:28 PM
Hi @saikat
As I can understand you are running a merge query and it is failing with java.lang.OutOfMemoryError error.
Step 1: Could you please run major compaction on all the tables involves in the merge query(If it is an ACID table or else ignore step1). Once the major compaction is triggered make sure it got completed by running "show compactions;" command in the beeline. This will bring down some stats collection burden for the hive.
How to run minor and major compaction?
Alter table <table name> compact 'MAJOR';
Step 2: Once step1 is done. Please set the following propery in beeline session level and re-run the merge query
set hive.tez.container.size=16384;
set hive.tez.java.opts=-Xmx13107m;
set tez.runtime.io.sort.mb=4096;
set tez.task.resource.memory.mb=16384;
set tez.am.resource.memory.mb=16384;
set tez.am.launch.cmd-opts=-Xmx13107m;
set hive.auto.convert.join=false;
The TEZ container and AM size is set as 16GB, if the query got failed you can increase the value to 20GB(then hive.tez.java.opts and tez.am.launch.cmd-opts need to be configured 80% of container and AM size that is 16384).
If the query got succeeded with 16GB of TEZ container and AM size then you can try to decrease it too 14/12/10 and figure out a benchmark where it is failing and getting succeeded. In this way, you can save resources.
If you are happy with the comment, Mark it "Accept as Solution".
Created 09-01-2021 10:28 PM
Hi @saikat
As I can understand you are running a merge query and it is failing with java.lang.OutOfMemoryError error.
Step 1: Could you please run major compaction on all the tables involves in the merge query(If it is an ACID table or else ignore step1). Once the major compaction is triggered make sure it got completed by running "show compactions;" command in the beeline. This will bring down some stats collection burden for the hive.
How to run minor and major compaction?
Alter table <table name> compact 'MAJOR';
Step 2: Once step1 is done. Please set the following propery in beeline session level and re-run the merge query
set hive.tez.container.size=16384;
set hive.tez.java.opts=-Xmx13107m;
set tez.runtime.io.sort.mb=4096;
set tez.task.resource.memory.mb=16384;
set tez.am.resource.memory.mb=16384;
set tez.am.launch.cmd-opts=-Xmx13107m;
set hive.auto.convert.join=false;
The TEZ container and AM size is set as 16GB, if the query got failed you can increase the value to 20GB(then hive.tez.java.opts and tez.am.launch.cmd-opts need to be configured 80% of container and AM size that is 16384).
If the query got succeeded with 16GB of TEZ container and AM size then you can try to decrease it too 14/12/10 and figure out a benchmark where it is failing and getting succeeded. In this way, you can save resources.
If you are happy with the comment, Mark it "Accept as Solution".