Created on 07-01-2021 07:33 AM - last edited on 07-01-2021 08:45 AM by VidyaSargur
Hello All,
A Hive Query just ran for almost 7.5 Hours and I'm trying to understand what has caused such a significantly long runtime.
These are the query stats:
[2K[36;1m VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
[22;0m[2K----------------------------------------------------------------------------------------------
[2KMap 1 .......... container SUCCEEDED 3348 3348 0 0 0 0
Map 4 .......... container SUCCEEDED 27 27 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1009 1009 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
[2K----------------------------------------------------------------------------------------------
[2K[31;1mVERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 27285.00 s
[22;0m[2K----------------------------------------------------------------------------------------------
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.26s
INFO : Prepare Plan 0.05s
INFO : Get Query Coordinator (AM) 0.00s
INFO : Submit Plan 0.13s
INFO : Start DAG 0.03s
INFO : Run DAG 27270.01s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 20831855.00 326,923,890 1,208,403 9,569,213,817 1,112,670,816,499
INFO : Map 4 38042.00 403,640 2,166 47,259,215 73,854,016
INFO : Reducer 2 6524435.00 108,305,790 1,367,042 9,617,725,549 1,009
INFO : Reducer 3 73228.00 2,860 0 1,009 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 4385
INFO : TOTAL_LAUNCHED_TASKS: 4385
INFO : DATA_LOCAL_TASKS: 554
INFO : RACK_LOCAL_TASKS: 2821
INFO : AM_CPU_MILLISECONDS: 1643080
INFO : AM_GC_TIME_MILLIS: 6737
INFO : File System Counters:
INFO : FILE_BYTES_READ: 1860884020897
INFO : FILE_BYTES_WRITTEN: 3773542394532
INFO : HDFS_BYTES_READ: 2949584168417
INFO : HDFS_BYTES_WRITTEN: 74639411482
INFO : HDFS_READ_OPS: 8926
INFO : HDFS_WRITE_OPS: 3029
INFO : HDFS_OP_CREATE: 2019
INFO : HDFS_OP_GET_FILE_STATUS: 3029
INFO : HDFS_OP_OPEN: 5897
INFO : HDFS_OP_RENAME: 1010
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : REDUCE_INPUT_GROUPS: 410577279
INFO : REDUCE_INPUT_RECORDS: 9617725549
INFO : COMBINE_INPUT_RECORDS: 0
INFO : SPILLED_RECORDS: 20722623211
INFO : NUM_SHUFFLED_INPUTS: 3406384
INFO : NUM_SKIPPED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : MERGED_MAP_OUTPUTS: 71643172699
INFO : GC_TIME_MILLIS: 2577611
INFO : TASK_DURATION_MILLIS: 406524896
INFO : CPU_MILLISECONDS: 435636180
INFO : PHYSICAL_MEMORY_BYTES: 75333726371840
INFO : VIRTUAL_MEMORY_BYTES: 88797821091840
INFO : COMMITTED_HEAP_BYTES: 75333726371840
INFO : INPUT_RECORDS_PROCESSED: 9570513520
INFO : INPUT_SPLIT_LENGTH_BYTES: 2949584055487
INFO : OUTPUT_RECORDS: 9617726558
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 2238503743864
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 2139826941103
INFO : OUTPUT_BYTES_PHYSICAL: 1211054027209
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 425866143406
INFO : ADDITIONAL_SPILLS_BYTES_READ: 1747889484679
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_CHUNK_COUNT: 3375
INFO : SHUFFLE_BYTES: 1211054002993
INFO : SHUFFLE_BYTES_DECOMPRESSED: 2139826941103
INFO : SHUFFLE_BYTES_TO_MEM: 1003782114883
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 207271888110
INFO : NUM_MEM_TO_DISK_MERGES: 180
INFO : NUM_DISK_TO_DISK_MERGES: 1116
INFO : SHUFFLE_PHASE_TIME: 14878024
INFO : MERGE_PHASE_TIME: 25783550
INFO : FIRST_EVENT_RECEIVED: 7091
INFO : LAST_EVENT_RECEIVED: 749095
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 1010
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 9569213817
INFO : RECORDS_IN_Map_4: 47259215
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_1_upifrm2020.upi_ft_selected_py_510: 2196207622
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 1112670816499
INFO : RECORDS_OUT_INTERMEDIATE_Map_4: 73854016
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 1009
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO : RECORDS_OUT_OPERATOR_FIL_32: 9570466334
INFO : RECORDS_OUT_OPERATOR_FIL_35: 47259215
INFO : RECORDS_OUT_OPERATOR_FS_10: 2196207622
INFO : RECORDS_OUT_OPERATOR_FS_20: 1
INFO : RECORDS_OUT_OPERATOR_GBY_16: 1009
INFO : RECORDS_OUT_OPERATOR_GBY_18: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_MERGEJOIN_31: 2196207622
INFO : RECORDS_OUT_OPERATOR_RS_17: 1009
INFO : RECORDS_OUT_OPERATOR_RS_34: 1112670816499
INFO : RECORDS_OUT_OPERATOR_RS_37: 73854016
INFO : RECORDS_OUT_OPERATOR_SEL_15: 2196207622
INFO : RECORDS_OUT_OPERATOR_SEL_33: 9570466334
INFO : RECORDS_OUT_OPERATOR_SEL_36: 47259215
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2196207622
INFO : RECORDS_OUT_OPERATOR_TS_0: 9570466334
INFO : RECORDS_OUT_OPERATOR_TS_3: 47259215
INFO : TOTAL_TABLE_ROWS_WRITTEN: 2196207622
INFO : Shuffle Errors:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : Shuffle Errors_Reducer_2_INPUT_Map_4:
INFO : BAD_ID: 0
INFO : CONNECTION: 0
INFO : IO_ERROR: 0
INFO : WRONG_LENGTH: 0
INFO : WRONG_MAP: 0
INFO : WRONG_REDUCE: 0
INFO : TaskCounter_Map_1_INPUT_a:
INFO : INPUT_RECORDS_PROCESSED: 9570466334
INFO : INPUT_SPLIT_LENGTH_BYTES: 2949222655519
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 2237144940183
INFO : OUTPUT_BYTES_PHYSICAL: 1210237277829
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 2138373445444
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 9570466334
INFO : SHUFFLE_CHUNK_COUNT: 3348
INFO : SPILLED_RECORDS: 9570466334
INFO : TaskCounter_Map_4_INPUT_b:
INFO : INPUT_RECORDS_PROCESSED: 46177
INFO : INPUT_SPLIT_LENGTH_BYTES: 361399968
INFO : TaskCounter_Map_4_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 1348619833
INFO : OUTPUT_BYTES_PHYSICAL: 808977386
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 1443301721
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 47259215
INFO : SHUFFLE_CHUNK_COUNT: 27
INFO : SPILLED_RECORDS: 47259215
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 1747171072110
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 425288767549
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 3494
INFO : LAST_EVENT_RECEIVED: 668391
INFO : MERGED_MAP_OUTPUTS: 71643145456
INFO : MERGE_PHASE_TIME: 25289522
INFO : NUM_DISK_TO_DISK_MERGES: 1116
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 180
INFO : NUM_SHUFFLED_INPUTS: 3378132
INFO : NUM_SKIPPED_INPUTS: 0
INFO : REDUCE_INPUT_GROUPS: 363318064
INFO : REDUCE_INPUT_RECORDS: 9570466334
INFO : SHUFFLE_BYTES: 1210237277829
INFO : SHUFFLE_BYTES_DECOMPRESSED: 2138373445444
INFO : SHUFFLE_BYTES_DISK_DIRECT: 207129855550
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 1003107422279
INFO : SHUFFLE_PHASE_TIME: 14385309
INFO : SPILLED_RECORDS: 11057638447
INFO : TaskCounter_Reducer_2_INPUT_Map_4:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 718412569
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 577375857
INFO : COMBINE_INPUT_RECORDS: 0
INFO : FIRST_EVENT_RECEIVED: 3584
INFO : LAST_EVENT_RECEIVED: 7551
INFO : MERGED_MAP_OUTPUTS: 27243
INFO : MERGE_PHASE_TIME: 494028
INFO : NUM_DISK_TO_DISK_MERGES: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_MEM_TO_DISK_MERGES: 0
INFO : NUM_SHUFFLED_INPUTS: 27243
INFO : NUM_SKIPPED_INPUTS: 0
INFO : REDUCE_INPUT_GROUPS: 47259215
INFO : REDUCE_INPUT_RECORDS: 47259215
INFO : SHUFFLE_BYTES: 808977386
INFO : SHUFFLE_BYTES_DECOMPRESSED: 1443301721
INFO : SHUFFLE_BYTES_DISK_DIRECT: 141036712
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 667940674
INFO : SHUFFLE_PHASE_TIME: 419560
INFO : SPILLED_RECORDS: 47259215
INFO : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 10183848
INFO : OUTPUT_BYTES_PHYSICAL: 7771994
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 10193938
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 1009
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO : FIRST_EVENT_RECEIVED: 13
INFO : INPUT_RECORDS_PROCESSED: 1009
INFO : LAST_EVENT_RECEIVED: 73153
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 1009
INFO : SHUFFLE_BYTES: 7747778
INFO : SHUFFLE_BYTES_DECOMPRESSED: 10193938
INFO : SHUFFLE_BYTES_DISK_DIRECT: 995848
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 6751930
INFO : SHUFFLE_PHASE_TIME: 73155
INFO : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 3348
INFO : GROUPED_INPUT_SPLITS_Map_4: 27
INFO : INPUT_DIRECTORIES_Map_1: 96
INFO : INPUT_DIRECTORIES_Map_4: 1
INFO : INPUT_FILES_Map_1: 5843
INFO : INPUT_FILES_Map_4: 27
INFO : RAW_INPUT_SPLITS_Map_1: 5843
INFO : RAW_INPUT_SPLITS_Map_4: 27
This is the query:
create table mydb.mytbl
as select col1,col2,...col18
from mydb.tbl1 a
inner join mydb.tbl2 b on concat(trim(col1),trim(col2))=colb where date_partition_col between date_sub('2021-05-10',95) and '2021-05-10';
This is the explain plan:
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Map 1 <- Map 2 (BROADCAST_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Map 1 vectorized |
| File Output Operator [FS_30] |
| Select Operator [SEL_29] (rows=122690953 width=12844) |
| Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
| Map Join Operator [MAPJOIN_28] (rows=122690953 width=12844) |
| Conds:SEL_27._col18=RS_25._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
| <-Map 2 [BROADCAST_EDGE] vectorized |
| BROADCAST [RS_25] |
| PartitionCols:_col0 |
| Select Operator [SEL_24] (rows=47259215 width=110) |
| Output:["_col0"] |
| Filter Operator [FIL_23] (rows=47259215 width=110) |
| predicate:colb is not null |
| TableScan [TS_3] (rows=47259215 width=110) |
| mydb@tbl2,b, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["colb"] |
| <-Select Operator [SEL_27] (rows=111537228 width=12844) |
| Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"] |
| Filter Operator [FIL_26] (rows=111537228 width=12844) |
| predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') |
| TableScan [TS_0] (rows=1056668483 width=12844) |
| mydb@tbl1,a,Tbl:PARTIAL,Col:PARTIAL,Output:["col1","col2",...,"col18"] |
Can someone help me figure out the bottlenecks here?
Thanks,
Megh
Created 07-01-2021 10:28 AM
I see the two large tables are scanned and it is done at Map 1 phase and scanned around 9,569,213,817 records.
predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') ==> scanning entire large table
Can you try below one:
create table mydb.mytbl
as select col1,col2,...col18 where here date_partition_col between date_sub('2021-05-10',95) and '2021-05-10 from from mydb.tbl1 a
inner join mydb.tbl2 b on concat(trim(col1),trim(col2))=colb;
Created 07-06-2021 02:18 AM
Hi @vidanimegh, Has @asish's reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 07-08-2021 10:23 PM
Hi @asish ,
Apologies for the late reply.
I tried the query you have suggested but still don't observe any difference in the runtime.
Thanks,
Megh
Created 07-09-2021 04:28 AM
hi megh
Can you share application logs and beeline trace .
Thaanks,
Asish
Created 08-02-2021 03:56 AM
Hello @asish ,
Apologies for not replying earlier.
I recently thought that I'll compare a simple count(*) query performance on a single partition on my old cluster vs the current cluster and observe the difference I'm getting in the query counters.
What I discovered is that there is one GC_TIME_MILLIS counter which is significantly higher in the current cluster.
to give an example, this is the query I ran on both the clusters (Both clusters are having identical hardware and same amount of resources):
select count(*) from mydb.tbl1 where date_partition_col='2021-07-30';
Following is the trace from the old cluster:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 64 64 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 79.65 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 79.64 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.23s
INFO : Prepare Plan 7.90s
INFO : Get Query Coordinator (AM) 0.21s
INFO : Submit Plan 0.24s
INFO : Start DAG 0.96s
INFO : Run DAG 79.64s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 73845.00 1,642,830 10,289 127,396,873 95
INFO : Reducer 2 31853.00 4,120 71 64 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 65
INFO : TOTAL_LAUNCHED_TASKS: 65
INFO : DATA_LOCAL_TASKS: 7
INFO : RACK_LOCAL_TASKS: 57
INFO : AM_CPU_MILLISECONDS: 14500
INFO : AM_GC_TIME_MILLIS: 0
INFO : File System Counters:
INFO : FILE_BYTES_READ: 300
INFO : FILE_BYTES_WRITTEN: 3840
INFO : HDFS_BYTES_READ: 39636387430
INFO : HDFS_BYTES_WRITTEN: 248
INFO : HDFS_READ_OPS: 130
INFO : HDFS_WRITE_OPS: 2
INFO : HDFS_OP_CREATE: 1
INFO : HDFS_OP_GET_FILE_STATUS: 3
INFO : HDFS_OP_OPEN: 127
INFO : HDFS_OP_RENAME: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 64
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 10360
INFO : TASK_DURATION_MILLIS: 1505522
INFO : CPU_MILLISECONDS: 1646950
INFO : PHYSICAL_MEMORY_BYTES: 139586437120
INFO : VIRTUAL_MEMORY_BYTES: 607444127744
INFO : COMMITTED_HEAP_BYTES: 139586437120
INFO : INPUT_RECORDS_PROCESSED: 127429957
INFO : INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO : OUTPUT_RECORDS: 64
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 384
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO : OUTPUT_BYTES_PHYSICAL: 3328
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 1792
INFO : SHUFFLE_BYTES_DECOMPRESSED: 896
INFO : SHUFFLE_BYTES_TO_MEM: 1652
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 140
INFO : SHUFFLE_PHASE_TIME: 30751
INFO : FIRST_EVENT_RECEIVED: 89
INFO : LAST_EVENT_RECEIVED: 30749
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 127396873
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 95
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 1
INFO : RECORDS_OUT_OPERATOR_GBY_10: 64
INFO : RECORDS_OUT_OPERATOR_GBY_12: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_11: 95
INFO : RECORDS_OUT_OPERATOR_SEL_9: 127429893
INFO : RECORDS_OUT_OPERATOR_TS_0: 127429893
INFO : TaskCounter_Map_1_INPUT_urcs_transactions:
INFO : INPUT_RECORDS_PROCESSED: 127429893
INFO : INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : OUTPUT_BYTES: 384
INFO : OUTPUT_BYTES_PHYSICAL: 3328
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 64
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 89
INFO : INPUT_RECORDS_PROCESSED: 64
INFO : LAST_EVENT_RECEIVED: 30749
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 64
INFO : SHUFFLE_BYTES: 1792
INFO : SHUFFLE_BYTES_DECOMPRESSED: 896
INFO : SHUFFLE_BYTES_DISK_DIRECT: 140
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 1652
INFO : SHUFFLE_PHASE_TIME: 30751
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO : OUTPUT_RECORDS: 0
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 64
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 127
INFO : RAW_INPUT_SPLITS_Map_1: 127
INFO : Completed executing command(queryId=hive_20210802162145_c6dc5748-ef38-42fb-becd-46dd33d2dd16); Time taken: 88.954 seconds
INFO : OK
+------------+
| _c0 |
+------------+
| 127429893 |
+------------+
1 row selected (89.234 seconds)
Same query trace from the new cluster:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 64 64 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 176.45 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 176.42 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.14s
INFO : Prepare Plan 0.04s
INFO : Get Query Coordinator (AM) 0.00s
INFO : Submit Plan 4.79s
INFO : Start DAG 0.04s
INFO : Run DAG 176.42s
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
INFO : ----------------------------------------------------------------------------------------------
INFO : Map 1 173773.00 9,458,900 151,977 127,398,948 95
INFO : Reducer 2 53863.00 2,490 0 64 0
INFO : ----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 65
INFO : TOTAL_LAUNCHED_TASKS: 65
INFO : DATA_LOCAL_TASKS: 11
INFO : RACK_LOCAL_TASKS: 53
INFO : AM_CPU_MILLISECONDS: 22160
INFO : AM_GC_TIME_MILLIS: 0
INFO : File System Counters:
INFO : FILE_BYTES_READ: 480
INFO : FILE_BYTES_WRITTEN: 3840
INFO : HDFS_BYTES_READ: 39636387430
INFO : HDFS_BYTES_WRITTEN: 109
INFO : HDFS_READ_OPS: 129
INFO : HDFS_WRITE_OPS: 2
INFO : HDFS_OP_CREATE: 1
INFO : HDFS_OP_GET_FILE_STATUS: 2
INFO : HDFS_OP_OPEN: 127
INFO : HDFS_OP_RENAME: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 64
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 151977
INFO : TASK_DURATION_MILLIS: 4806095
INFO : CPU_MILLISECONDS: 9461390
INFO : PHYSICAL_MEMORY_BYTES: 318765006848
INFO : VIRTUAL_MEMORY_BYTES: 607499304960
INFO : COMMITTED_HEAP_BYTES: 318765006848
INFO : INPUT_RECORDS_PROCESSED: 127429957
INFO : INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO : OUTPUT_RECORDS: 64
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 384
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO : OUTPUT_BYTES_PHYSICAL: 3328
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 1792
INFO : SHUFFLE_BYTES_DECOMPRESSED: 896
INFO : SHUFFLE_BYTES_TO_MEM: 1568
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 224
INFO : SHUFFLE_PHASE_TIME: 54159
INFO : FIRST_EVENT_RECEIVED: 92
INFO : LAST_EVENT_RECEIVED: 54149
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 127398948
INFO : RECORDS_OUT_0: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 95
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 1
INFO : RECORDS_OUT_OPERATOR_GBY_10: 64
INFO : RECORDS_OUT_OPERATOR_GBY_12: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_11: 95
INFO : RECORDS_OUT_OPERATOR_SEL_9: 127429893
INFO : RECORDS_OUT_OPERATOR_TS_0: 127429893
INFO : TaskCounter_Map_1_INPUT_urcs_transactions:
INFO : INPUT_RECORDS_PROCESSED: 127429893
INFO : INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : DATA_BYTES_VIA_EVENT: 0
INFO : OUTPUT_BYTES: 384
INFO : OUTPUT_BYTES_PHYSICAL: 3328
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_RECORDS: 64
INFO : SPILLED_RECORDS: 0
INFO : TaskCounter_Reducer_2_INPUT_Map_1:
INFO : FIRST_EVENT_RECEIVED: 92
INFO : INPUT_RECORDS_PROCESSED: 64
INFO : LAST_EVENT_RECEIVED: 54149
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : NUM_SHUFFLED_INPUTS: 64
INFO : SHUFFLE_BYTES: 1792
INFO : SHUFFLE_BYTES_DECOMPRESSED: 896
INFO : SHUFFLE_BYTES_DISK_DIRECT: 224
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_TO_MEM: 1568
INFO : SHUFFLE_PHASE_TIME: 54159
INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 64 64 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 176.45 s
----------------------------------------------------------------------------------------------
+------------+
| _c0 |
+------------+
| 127429893 |
+------------+
1 row selected (181.488 seconds)
The only significant difference I see between these two is for GC_TIME_MILLIS.
I tried to tune multiple Hive and YARN config settings related to Heap Memory and GC but there is no difference observed.
Any ideas?
Thanks,
Megh
Created 08-03-2021 03:25 AM
Hi Megh, Thanks for the logs. I found a discrepancy in HDP logs,as per "Query counters on HDP Cluster.txt" ,the Mapper spawned is 64 but from the application_1627033595344_12159_hdp.log,I observe it is 127. VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 64 64 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- grep "Event:TASK_STARTED]: vertexName=Map 1" syslog_dag_1627033595344_12159_1 | wc -l 127 grep "Event:TASK_FINISHED]: vertexName=Map 1" syslog_dag_1627033595344_12159_1 | wc -l 127 Question: have you provided correct application log for HDP ? As I see a discrepancy. Assuming you have provided correct set of logs for HDO,I compared between CDP and HDP,I see that input records for each maper is more in CDP then HDP that is the reason it took more tie in CDP. I have collected samle taskID and INPUT_RECORDS_PROCESSED for each maper. application_1627885418606_0016_cdp.log ====== each Mapper took close to 1 min to finish taskId=task_1627885418606_0016_1_00_000059 INPUT_RECORDS_PROCESSED=2012469 taskId=task_1627885418606_0016_1_00_000061 INPUT_RECORDS_PROCESSED=2010369 taskId=task_1627885418606_0016_1_00_000026 INPUT_RECORDS_PROCESSED=2021105 application_1627033595344_12159_hdp.log ========== taskId=task_1627033595344_12159_1_00_000062 INPUT_RECORDS_PROCESSED=1010250 taskId=task_1627033595344_12159_1_00_000126 INPUT_RECORDS_PROCESSED=376062 Thanks, Asish
Created 08-08-2021 10:24 PM
@vidanimegh, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 08-12-2021 05:01 AM
Hi @VidyaSargur ,
We're having an open thread on Cloudera support case, where @asish and I are discussing this same issue.
As and when we get a resolution for this, We'll update this question and close.
Thanks,
Megh
Created 08-12-2021 05:10 AM
@vidanimegh, Sure. Thanks for your response, will wait for your update.
Regards,
Vidya Sargur,