Support Questions

Find answers, ask questions, and share your expertise

Error while executing hive merge query

avatar
Explorer

Getting out of memory error while running merge query. 

From Cloudera manager Hive on tez Client Java Heap Size = 8GB, Java Heap Size of HiveServer2 = 16GB

 

Destination table is partitioned by date. Log is given following. Please anyone can give suggestions what configuration is needed to be changed here?

 

ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_FAILED_TASKS: 1
INFO  :    NUM_SUCCEEDED_TASKS: 312
INFO  :    TOTAL_LAUNCHED_TASKS: 313
INFO  :    DATA_LOCAL_TASKS: 103
INFO  :    RACK_LOCAL_TASKS: 2
INFO  :    AM_CPU_MILLISECONDS: 20170
INFO  :    AM_GC_TIME_MILLIS: 14
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 1692027391
INFO  :    FILE_BYTES_WRITTEN: 1446300353
INFO  :    HDFS_BYTES_READ: 1490822899
INFO  :    HDFS_BYTES_WRITTEN: 316667034
INFO  :    HDFS_READ_OPS: 46265
INFO  :    HDFS_WRITE_OPS: 813
INFO  :    HDFS_OP_CREATE: 610
INFO  :    HDFS_OP_GET_FILE_STATUS: 26668
INFO  :    HDFS_OP_MKDIRS: 202
INFO  :    HDFS_OP_OPEN: 19597
INFO  :    HDFS_OP_RENAME: 1
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 12682288
INFO  :    REDUCE_INPUT_RECORDS: 12682489
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 25380919
INFO  :    NUM_SHUFFLED_INPUTS: 80831
INFO  :    NUM_SKIPPED_INPUTS: 26027
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 80726
INFO  :    GC_TIME_MILLIS: 23019
INFO  :    TASK_DURATION_MILLIS: 1358201
INFO  :    CPU_MILLISECONDS: 2267230
INFO  :    PHYSICAL_MEMORY_BYTES: 656928669696
INFO  :    VIRTUAL_MEMORY_BYTES: 1756473344000
INFO  :    COMMITTED_HEAP_BYTES: 656928669696
INFO  :    INPUT_RECORDS_PROCESSED: 165868
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 10560419772
INFO  :    OUTPUT_RECORDS: 12698477
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 1812594164
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 1843377381
INFO  :    OUTPUT_BYTES_PHYSICAL: 830127603
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 613599766
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 724978605
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 711
INFO  :    SHUFFLE_BYTES: 1064140784
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 2073887365
INFO  :    SHUFFLE_BYTES_TO_MEM: 905909080
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 158231704
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    SHUFFLE_PHASE_TIME: 812287
INFO  :    MERGE_PHASE_TIME: 825246
INFO  :    FIRST_EVENT_RECEIVED: 5873
INFO  :    LAST_EVENT_RECEIVED: 562836
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_FILES: 203
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 162322988
INFO  :    RECORDS_IN_Map_6: 5062154
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_cpstxn.cps_trans_record: 5046213
INFO  :    RECORDS_OUT_3_default.merge_tmp_table: 0
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 10839678
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_6: 6287111
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 32084
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_4: 0
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_5: 0
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_7: 1
INFO  :    RECORDS_OUT_OPERATOR_EVENT_53: 46
INFO  :    RECORDS_OUT_OPERATOR_FIL_33: 15941
INFO  :    RECORDS_OUT_OPERATOR_FIL_34: 0
INFO  :    RECORDS_OUT_OPERATOR_FIL_35: 15941
INFO  :    RECORDS_OUT_OPERATOR_FIL_36: 5046213
INFO  :    RECORDS_OUT_OPERATOR_FIL_57: 7604192
INFO  :    RECORDS_OUT_OPERATOR_FS_15: 1
INFO  :    RECORDS_OUT_OPERATOR_FS_31: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_8: 5046213
INFO  :    RECORDS_OUT_OPERATOR_GBY_11: 202
INFO  :    RECORDS_OUT_OPERATOR_GBY_13: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_26: 15941
INFO  :    RECORDS_OUT_OPERATOR_GBY_28: 15941
INFO  :    RECORDS_OUT_OPERATOR_GBY_51: 192
INFO  :    RECORDS_OUT_OPERATOR_GBY_52: 46
INFO  :    RECORDS_OUT_OPERATOR_GBY_55: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_MERGEJOIN_47: 5062154
INFO  :    RECORDS_OUT_OPERATOR_RS_12: 202
INFO  :    RECORDS_OUT_OPERATOR_RS_18: 15941
INFO  :    RECORDS_OUT_OPERATOR_RS_27: 15941
INFO  :    RECORDS_OUT_OPERATOR_RS_48: 6287054
INFO  :    RECORDS_OUT_OPERATOR_RS_54: 57
INFO  :    RECORDS_OUT_OPERATOR_RS_56: 1
INFO  :    RECORDS_OUT_OPERATOR_RS_58: 10839678
INFO  :    RECORDS_OUT_OPERATOR_SEL_10: 5046213
INFO  :    RECORDS_OUT_OPERATOR_SEL_14: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_17: 15941
INFO  :    RECORDS_OUT_OPERATOR_SEL_25: 15941
INFO  :    RECORDS_OUT_OPERATOR_SEL_30: 0
INFO  :    RECORDS_OUT_OPERATOR_SEL_49: 5062154
INFO  :    RECORDS_OUT_OPERATOR_SEL_50: 5062154
INFO  :    RECORDS_OUT_OPERATOR_SEL_6: 5046213
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 162322988
INFO  :    RECORDS_OUT_OPERATOR_TS_1: 5062154
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 5046213
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_6:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_4_INPUT_Reducer_2:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_5_INPUT_Reducer_2:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_Reducer_7:
INFO  :    FIRST_EVENT_RECEIVED: 1058
INFO  :    INPUT_RECORDS_PROCESSED: 59
INFO  :    LAST_EVENT_RECEIVED: 1058
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 59
INFO  :    SHUFFLE_BYTES: 239431145
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 239408548
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 40581550
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 198849595
INFO  :    SHUFFLE_PHASE_TIME: 3670
INFO  : TaskCounter_Map_1_INPUT_tx:
INFO  :    INPUT_RECORDS_PROCESSED: 160794
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 10173459852
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 380907998
INFO  :    OUTPUT_BYTES_PHYSICAL: 188683497
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 396318618
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 7604192
INFO  :    SHUFFLE_CHUNK_COUNT: 59
INFO  :    SPILLED_RECORDS: 7604192
INFO  : TaskCounter_Map_6_INPUT_otx:
INFO  :    INPUT_RECORDS_PROCESSED: 4969
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 386959920
INFO  : TaskCounter_Map_6_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 1232523507
INFO  :    OUTPUT_BYTES_PHYSICAL: 586260727
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 1247795251
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 5062154
INFO  :    SHUFFLE_CHUNK_COUNT: 46
INFO  :    SPILLED_RECORDS: 5062154
INFO  : TaskCounter_Map_6_OUTPUT_Reducer_7:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 186657006
INFO  :    OUTPUT_BYTES_PHYSICAL: 47668704
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 186657512
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 46
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 180171233
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 154085705
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3343
INFO  :    LAST_EVENT_RECEIVED: 551166
INFO  :    MERGED_MAP_OUTPUTS: 33706
INFO  :    MERGE_PHASE_TIME: 656745
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 33706
INFO  :    NUM_SKIPPED_INPUTS: 25825
INFO  :    REDUCE_INPUT_GROUPS: 7604192
INFO  :    REDUCE_INPUT_RECORDS: 7604192
INFO  :    SHUFFLE_BYTES: 188683497
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 396318618
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 26085528
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 162597969
INFO  :    SHUFFLE_PHASE_TIME: 649878
INFO  :    SPILLED_RECORDS: 7604192
INFO  : TaskCounter_Reducer_2_INPUT_Map_6:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 542792597
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 457832598
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 1449
INFO  :    LAST_EVENT_RECEIVED: 4262
INFO  :    MERGED_MAP_OUTPUTS: 46414
INFO  :    MERGE_PHASE_TIME: 163079
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 46414
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 5062154
INFO  :    REDUCE_INPUT_RECORDS: 5062154
INFO  :    SHUFFLE_BYTES: 586260727
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 1247795251
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 84959999
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 501300728
INFO  :    SHUFFLE_PHASE_TIME: 152326
INFO  :    SPILLED_RECORDS: 5062154
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 4776782
INFO  :    OUTPUT_BYTES_PHYSICAL: 1358681
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 4840792
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 15941
INFO  :    SHUFFLE_CHUNK_COUNT: 202
INFO  :    SPILLED_RECORDS: 15941
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_4:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 510112
INFO  :    OUTPUT_BYTES_PHYSICAL: 150267
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 544418
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 15941
INFO  :    SHUFFLE_CHUNK_COUNT: 202
INFO  :    SPILLED_RECORDS: 15941
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_5:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 3160998
INFO  :    OUTPUT_BYTES_PHYSICAL: 1947548
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 3163018
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 202
INFO  :    SHUFFLE_CHUNK_COUNT: 202
INFO  :    SPILLED_RECORDS: 202
INFO  : TaskCounter_Reducer_4_INPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 122588
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 97329
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3
INFO  :    LAST_EVENT_RECEIVED: 2619
INFO  :    MERGED_MAP_OUTPUTS: 404
INFO  :    MERGE_PHASE_TIME: 2682
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 404
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 15941
INFO  :    REDUCE_INPUT_RECORDS: 15941
INFO  :    SHUFFLE_BYTES: 150267
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 544418
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 25259
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 125008
INFO  :    SHUFFLE_PHASE_TIME: 2664
INFO  :    SPILLED_RECORDS: 15941
INFO  : TaskCounter_Reducer_4_OUTPUT_out_Reducer_4:
INFO  :    OUTPUT_RECORDS: 0
INFO  : TaskCounter_Reducer_5_INPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 1892187
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 1584134
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 2
INFO  :    LAST_EVENT_RECEIVED: 2723
INFO  :    MERGED_MAP_OUTPUTS: 202
INFO  :    MERGE_PHASE_TIME: 2740
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 202
INFO  :    NUM_SKIPPED_INPUTS: 202
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 202
INFO  :    SHUFFLE_BYTES: 1947548
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 3163018
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 308053
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 1639495
INFO  :    SHUFFLE_PHASE_TIME: 2728
INFO  :    SPILLED_RECORDS: 202
INFO  : TaskCounter_Reducer_5_OUTPUT_out_Reducer_5:
INFO  :    OUTPUT_RECORDS: 0
INFO  : TaskCounter_Reducer_7_INPUT_Map_6:
INFO  :    FIRST_EVENT_RECEIVED: 18
INFO  :    INPUT_RECORDS_PROCESSED: 46
INFO  :    LAST_EVENT_RECEIVED: 1008
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 46
INFO  :    SHUFFLE_BYTES: 47667600
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 186657512
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 6271315
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 41396285
INFO  :    SHUFFLE_PHASE_TIME: 1021
INFO  : TaskCounter_Reducer_7_OUTPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 4057761
INFO  :    OUTPUT_BYTES_PHYSICAL: 4058179
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 4057772
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1
INFO  :    SPILLED_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 59
INFO  :    GROUPED_INPUT_SPLITS_Map_6: 46
INFO  :    INPUT_DIRECTORIES_Map_1: 9
INFO  :    INPUT_DIRECTORIES_Map_6: 1
INFO  :    INPUT_FILES_Map_1: 3722
INFO  :    INPUT_FILES_Map_6: 46
INFO  :    RAW_INPUT_SPLITS_Map_1: 3722
INFO  :    RAW_INPUT_SPLITS_Map_6: 46
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1629975626227_3988_3_06, diagnostics=[Task failed, taskId=task_1629975626227_3988_3_06_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.<init>(BytesColumnVector.java:85)
	at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:657)
	at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:661)
	at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:699)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:101)
	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:389)
	at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:456)
	at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:498)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1100)
	at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
	at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
, errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.<init>(BytesColumnVector.java:85)
	at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:657)
	at org.apache.orc.TypeDescription.createColumn(TypeDescription.java:661)
	at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:699)
	at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:101)
	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:389)
	at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:456)
	at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:498)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1100)
	at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
	at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:490)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:392)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1629975626227_3988_3_06 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @saikat 

As I can understand you are running a merge query and it is failing with java.lang.OutOfMemoryError error.

 

Step 1: Could you please run major compaction on all the tables involves in the merge query(If it is an ACID table or else ignore step1). Once the major compaction is triggered make sure it got completed by running "show compactions;" command in the beeline. This will bring down some stats collection burden for the hive.

How to run minor and major compaction?
Alter table <table name> compact 'MAJOR';

Step 2: Once step1 is done. Please set the following propery in beeline session level and re-run the merge query
set hive.tez.container.size=16384;
set hive.tez.java.opts=-Xmx13107m;
set tez.runtime.io.sort.mb=4096;
set tez.task.resource.memory.mb=16384;
set tez.am.resource.memory.mb=16384;
set tez.am.launch.cmd-opts=-Xmx13107m;
set hive.auto.convert.join=false;

The TEZ container and AM size is set as 16GB, if the query got failed you can increase the value to 20GB(then hive.tez.java.opts and tez.am.launch.cmd-opts need to be configured 80% of container and AM size that is 16384).

If the query got succeeded with 16GB of TEZ container and AM size then you can try to decrease it too 14/12/10 and figure out a benchmark where it is failing and getting succeeded. In this way, you can save resources.

If you are happy with the comment, Mark it "Accept as Solution".

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Hi @saikat 

As I can understand you are running a merge query and it is failing with java.lang.OutOfMemoryError error.

 

Step 1: Could you please run major compaction on all the tables involves in the merge query(If it is an ACID table or else ignore step1). Once the major compaction is triggered make sure it got completed by running "show compactions;" command in the beeline. This will bring down some stats collection burden for the hive.

How to run minor and major compaction?
Alter table <table name> compact 'MAJOR';

Step 2: Once step1 is done. Please set the following propery in beeline session level and re-run the merge query
set hive.tez.container.size=16384;
set hive.tez.java.opts=-Xmx13107m;
set tez.runtime.io.sort.mb=4096;
set tez.task.resource.memory.mb=16384;
set tez.am.resource.memory.mb=16384;
set tez.am.launch.cmd-opts=-Xmx13107m;
set hive.auto.convert.join=false;

The TEZ container and AM size is set as 16GB, if the query got failed you can increase the value to 20GB(then hive.tez.java.opts and tez.am.launch.cmd-opts need to be configured 80% of container and AM size that is 16384).

If the query got succeeded with 16GB of TEZ container and AM size then you can try to decrease it too 14/12/10 and figure out a benchmark where it is failing and getting succeeded. In this way, you can save resources.

If you are happy with the comment, Mark it "Accept as Solution".