Support Questions
Find answers, ask questions, and share your expertise

How to identify bottlenecks from Hive Query counters and Explain Plan

How to identify bottlenecks from Hive Query counters and Explain Plan

Expert Contributor

Hello All,

 

A Hive Query just ran for almost 7.5 Hours and I'm trying to understand what has caused such a significantly long runtime.

 

These are the query stats:

 

 

[2K[36;1m        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
[22;0m[2K----------------------------------------------------------------------------------------------
[2KMap 1 .......... container     SUCCEEDED   3348       3348        0        0       0       0  
Map 4 .......... container     SUCCEEDED     27         27        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED   1009       1009        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
[2K----------------------------------------------------------------------------------------------
[2K[31;1mVERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 27285.00 s 
[22;0m[2K----------------------------------------------------------------------------------------------

INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.26s
INFO  : Prepare Plan                            0.05s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                             0.13s
INFO  : Start DAG                               0.03s
INFO  : Run DAG                             27270.01s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1       20831855.00    326,923,890      1,208,403   9,569,213,817 1,112,670,816,499
INFO  :      Map 4          38042.00        403,640          2,166      47,259,215       73,854,016
INFO  :  Reducer 2        6524435.00    108,305,790      1,367,042   9,617,725,549            1,009
INFO  :  Reducer 3          73228.00          2,860              0           1,009                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 4385
INFO  :    TOTAL_LAUNCHED_TASKS: 4385
INFO  :    DATA_LOCAL_TASKS: 554
INFO  :    RACK_LOCAL_TASKS: 2821
INFO  :    AM_CPU_MILLISECONDS: 1643080
INFO  :    AM_GC_TIME_MILLIS: 6737
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 1860884020897
INFO  :    FILE_BYTES_WRITTEN: 3773542394532
INFO  :    HDFS_BYTES_READ: 2949584168417
INFO  :    HDFS_BYTES_WRITTEN: 74639411482
INFO  :    HDFS_READ_OPS: 8926
INFO  :    HDFS_WRITE_OPS: 3029
INFO  :    HDFS_OP_CREATE: 2019
INFO  :    HDFS_OP_GET_FILE_STATUS: 3029
INFO  :    HDFS_OP_OPEN: 5897
INFO  :    HDFS_OP_RENAME: 1010
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 410577279
INFO  :    REDUCE_INPUT_RECORDS: 9617725549
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 20722623211
INFO  :    NUM_SHUFFLED_INPUTS: 3406384
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 71643172699
INFO  :    GC_TIME_MILLIS: 2577611
INFO  :    TASK_DURATION_MILLIS: 406524896
INFO  :    CPU_MILLISECONDS: 435636180
INFO  :    PHYSICAL_MEMORY_BYTES: 75333726371840
INFO  :    VIRTUAL_MEMORY_BYTES: 88797821091840
INFO  :    COMMITTED_HEAP_BYTES: 75333726371840
INFO  :    INPUT_RECORDS_PROCESSED: 9570513520
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 2949584055487
INFO  :    OUTPUT_RECORDS: 9617726558
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 2238503743864
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 2139826941103
INFO  :    OUTPUT_BYTES_PHYSICAL: 1211054027209
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 425866143406
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 1747889484679
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 3375
INFO  :    SHUFFLE_BYTES: 1211054002993
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 2139826941103
INFO  :    SHUFFLE_BYTES_TO_MEM: 1003782114883
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 207271888110
INFO  :    NUM_MEM_TO_DISK_MERGES: 180
INFO  :    NUM_DISK_TO_DISK_MERGES: 1116
INFO  :    SHUFFLE_PHASE_TIME: 14878024
INFO  :    MERGE_PHASE_TIME: 25783550
INFO  :    FIRST_EVENT_RECEIVED: 7091
INFO  :    LAST_EVENT_RECEIVED: 749095
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_FILES: 1010
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 9569213817
INFO  :    RECORDS_IN_Map_4: 47259215
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_upifrm2020.upi_ft_selected_py_510: 2196207622
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 1112670816499
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_4: 73854016
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 1009
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO  :    RECORDS_OUT_OPERATOR_FIL_32: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_FIL_35: 47259215
INFO  :    RECORDS_OUT_OPERATOR_FS_10: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_FS_20: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_16: 1009
INFO  :    RECORDS_OUT_OPERATOR_GBY_18: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_MERGEJOIN_31: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_RS_17: 1009
INFO  :    RECORDS_OUT_OPERATOR_RS_34: 1112670816499
INFO  :    RECORDS_OUT_OPERATOR_RS_37: 73854016
INFO  :    RECORDS_OUT_OPERATOR_SEL_15: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_SEL_33: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_SEL_36: 47259215
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_TS_3: 47259215
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 2196207622
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_4:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_a:
INFO  :    INPUT_RECORDS_PROCESSED: 9570466334
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 2949222655519
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 2237144940183
INFO  :    OUTPUT_BYTES_PHYSICAL: 1210237277829
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 2138373445444
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 9570466334
INFO  :    SHUFFLE_CHUNK_COUNT: 3348
INFO  :    SPILLED_RECORDS: 9570466334
INFO  : TaskCounter_Map_4_INPUT_b:
INFO  :    INPUT_RECORDS_PROCESSED: 46177
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 361399968
INFO  : TaskCounter_Map_4_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 1348619833
INFO  :    OUTPUT_BYTES_PHYSICAL: 808977386
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 1443301721
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 47259215
INFO  :    SHUFFLE_CHUNK_COUNT: 27
INFO  :    SPILLED_RECORDS: 47259215
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 1747171072110
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 425288767549
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3494
INFO  :    LAST_EVENT_RECEIVED: 668391
INFO  :    MERGED_MAP_OUTPUTS: 71643145456
INFO  :    MERGE_PHASE_TIME: 25289522
INFO  :    NUM_DISK_TO_DISK_MERGES: 1116
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 180
INFO  :    NUM_SHUFFLED_INPUTS: 3378132
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 363318064
INFO  :    REDUCE_INPUT_RECORDS: 9570466334
INFO  :    SHUFFLE_BYTES: 1210237277829
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 2138373445444
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 207129855550
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 1003107422279
INFO  :    SHUFFLE_PHASE_TIME: 14385309
INFO  :    SPILLED_RECORDS: 11057638447
INFO  : TaskCounter_Reducer_2_INPUT_Map_4:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 718412569
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 577375857
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3584
INFO  :    LAST_EVENT_RECEIVED: 7551
INFO  :    MERGED_MAP_OUTPUTS: 27243
INFO  :    MERGE_PHASE_TIME: 494028
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 27243
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 47259215
INFO  :    REDUCE_INPUT_RECORDS: 47259215
INFO  :    SHUFFLE_BYTES: 808977386
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 1443301721
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 141036712
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 667940674
INFO  :    SHUFFLE_PHASE_TIME: 419560
INFO  :    SPILLED_RECORDS: 47259215
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 10183848
INFO  :    OUTPUT_BYTES_PHYSICAL: 7771994
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 10193938
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1009
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO  :    FIRST_EVENT_RECEIVED: 13
INFO  :    INPUT_RECORDS_PROCESSED: 1009
INFO  :    LAST_EVENT_RECEIVED: 73153
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1009
INFO  :    SHUFFLE_BYTES: 7747778
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 10193938
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 995848
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 6751930
INFO  :    SHUFFLE_PHASE_TIME: 73155
INFO  : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 3348
INFO  :    GROUPED_INPUT_SPLITS_Map_4: 27
INFO  :    INPUT_DIRECTORIES_Map_1: 96
INFO  :    INPUT_DIRECTORIES_Map_4: 1
INFO  :    INPUT_FILES_Map_1: 5843
INFO  :    INPUT_FILES_Map_4: 27
INFO  :    RAW_INPUT_SPLITS_Map_1: 5843
INFO  :    RAW_INPUT_SPLITS_Map_4: 27

 

 

 

This is the query:

 

 

create table mydb.mytbl
as select  col1,col2,...col18 
from mydb.tbl1 a 
inner join mydb.tbl2 b on  concat(trim(col1),trim(col2))=colb where date_partition_col between  date_sub('2021-05-10',95) and '2021-05-10';

 

 

 

This is the explain plan:

 

 

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Map 1 <- Map 2 (BROADCAST_EDGE)                    |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Map 1 vectorized                             |
|       File Output Operator [FS_30]                 |
|         Select Operator [SEL_29] (rows=122690953 width=12844) |
|           Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
|           Map Join Operator [MAPJOIN_28] (rows=122690953 width=12844) |
|             Conds:SEL_27._col18=RS_25._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
|           <-Map 2 [BROADCAST_EDGE] vectorized      |
|             BROADCAST [RS_25]                      |
|               PartitionCols:_col0                  |
|               Select Operator [SEL_24] (rows=47259215 width=110) |
|                 Output:["_col0"]                   |
|                 Filter Operator [FIL_23] (rows=47259215 width=110) |
|                   predicate:colb is not null |
|                   TableScan [TS_3] (rows=47259215 width=110) |
|                     mydb@tbl2,b, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["colb"] |
|           <-Select Operator [SEL_27] (rows=111537228 width=12844) |
|               Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"] |
|               Filter Operator [FIL_26] (rows=111537228 width=12844) |
|                 predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') |
|                 TableScan [TS_0] (rows=1056668483 width=12844) |
|                   mydb@tbl1,a,Tbl:PARTIAL,Col:PARTIAL,Output:["col1","col2",...,"col18"] |

 

 

 

Can someone help me figure out the bottlenecks here?

 

Thanks,

Megh

4 REPLIES 4

Re: How to identify bottlenecks from Hive Query counters and Explain Plan

Expert Contributor

I see the two large tables are scanned and it is done at Map 1 phase  and scanned around 9,569,213,817 records.

 

predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') ==> scanning entire large table

 

Can you try below one:

create table mydb.mytbl
as select col1,col2,...col18 where here date_partition_col between date_sub('2021-05-10',95) and '2021-05-10 from from mydb.tbl1 a
inner join mydb.tbl2 b on concat(trim(col1),trim(col2))=colb;

 

Re: How to identify bottlenecks from Hive Query counters and Explain Plan

Community Manager

Hi @vidanimegh, Has @asish's reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 


Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Re: How to identify bottlenecks from Hive Query counters and Explain Plan

Expert Contributor

Hi @asish ,

 

Apologies for the late reply.

 

I tried the query you have suggested but still don't observe any difference in the runtime.

 

Thanks,

Megh

Re: How to identify bottlenecks from Hive Query counters and Explain Plan

Expert Contributor

hi megh

 

Can you  share application logs and beeline trace .

 

Thaanks,

Asish