Support Questions

Find answers, ask questions, and share your expertise

How to identify bottlenecks from Hive Query counters and Explain Plan

avatar
Expert Contributor

Hello All,

 

A Hive Query just ran for almost 7.5 Hours and I'm trying to understand what has caused such a significantly long runtime.

 

These are the query stats:

 

 

[2K[36;1m        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
[22;0m[2K----------------------------------------------------------------------------------------------
[2KMap 1 .......... container     SUCCEEDED   3348       3348        0        0       0       0  
Map 4 .......... container     SUCCEEDED     27         27        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED   1009       1009        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
[2K----------------------------------------------------------------------------------------------
[2K[31;1mVERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 27285.00 s 
[22;0m[2K----------------------------------------------------------------------------------------------

INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.26s
INFO  : Prepare Plan                            0.05s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                             0.13s
INFO  : Start DAG                               0.03s
INFO  : Run DAG                             27270.01s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1       20831855.00    326,923,890      1,208,403   9,569,213,817 1,112,670,816,499
INFO  :      Map 4          38042.00        403,640          2,166      47,259,215       73,854,016
INFO  :  Reducer 2        6524435.00    108,305,790      1,367,042   9,617,725,549            1,009
INFO  :  Reducer 3          73228.00          2,860              0           1,009                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 4385
INFO  :    TOTAL_LAUNCHED_TASKS: 4385
INFO  :    DATA_LOCAL_TASKS: 554
INFO  :    RACK_LOCAL_TASKS: 2821
INFO  :    AM_CPU_MILLISECONDS: 1643080
INFO  :    AM_GC_TIME_MILLIS: 6737
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 1860884020897
INFO  :    FILE_BYTES_WRITTEN: 3773542394532
INFO  :    HDFS_BYTES_READ: 2949584168417
INFO  :    HDFS_BYTES_WRITTEN: 74639411482
INFO  :    HDFS_READ_OPS: 8926
INFO  :    HDFS_WRITE_OPS: 3029
INFO  :    HDFS_OP_CREATE: 2019
INFO  :    HDFS_OP_GET_FILE_STATUS: 3029
INFO  :    HDFS_OP_OPEN: 5897
INFO  :    HDFS_OP_RENAME: 1010
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 410577279
INFO  :    REDUCE_INPUT_RECORDS: 9617725549
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 20722623211
INFO  :    NUM_SHUFFLED_INPUTS: 3406384
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 71643172699
INFO  :    GC_TIME_MILLIS: 2577611
INFO  :    TASK_DURATION_MILLIS: 406524896
INFO  :    CPU_MILLISECONDS: 435636180
INFO  :    PHYSICAL_MEMORY_BYTES: 75333726371840
INFO  :    VIRTUAL_MEMORY_BYTES: 88797821091840
INFO  :    COMMITTED_HEAP_BYTES: 75333726371840
INFO  :    INPUT_RECORDS_PROCESSED: 9570513520
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 2949584055487
INFO  :    OUTPUT_RECORDS: 9617726558
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 2238503743864
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 2139826941103
INFO  :    OUTPUT_BYTES_PHYSICAL: 1211054027209
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 425866143406
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 1747889484679
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 3375
INFO  :    SHUFFLE_BYTES: 1211054002993
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 2139826941103
INFO  :    SHUFFLE_BYTES_TO_MEM: 1003782114883
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 207271888110
INFO  :    NUM_MEM_TO_DISK_MERGES: 180
INFO  :    NUM_DISK_TO_DISK_MERGES: 1116
INFO  :    SHUFFLE_PHASE_TIME: 14878024
INFO  :    MERGE_PHASE_TIME: 25783550
INFO  :    FIRST_EVENT_RECEIVED: 7091
INFO  :    LAST_EVENT_RECEIVED: 749095
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_FILES: 1010
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 9569213817
INFO  :    RECORDS_IN_Map_4: 47259215
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_upifrm2020.upi_ft_selected_py_510: 2196207622
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 1112670816499
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_4: 73854016
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 1009
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO  :    RECORDS_OUT_OPERATOR_FIL_32: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_FIL_35: 47259215
INFO  :    RECORDS_OUT_OPERATOR_FS_10: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_FS_20: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_16: 1009
INFO  :    RECORDS_OUT_OPERATOR_GBY_18: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_MERGEJOIN_31: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_RS_17: 1009
INFO  :    RECORDS_OUT_OPERATOR_RS_34: 1112670816499
INFO  :    RECORDS_OUT_OPERATOR_RS_37: 73854016
INFO  :    RECORDS_OUT_OPERATOR_SEL_15: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_SEL_33: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_SEL_36: 47259215
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 2196207622
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 9570466334
INFO  :    RECORDS_OUT_OPERATOR_TS_3: 47259215
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 2196207622
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_4:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_a:
INFO  :    INPUT_RECORDS_PROCESSED: 9570466334
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 2949222655519
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 2237144940183
INFO  :    OUTPUT_BYTES_PHYSICAL: 1210237277829
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 2138373445444
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 9570466334
INFO  :    SHUFFLE_CHUNK_COUNT: 3348
INFO  :    SPILLED_RECORDS: 9570466334
INFO  : TaskCounter_Map_4_INPUT_b:
INFO  :    INPUT_RECORDS_PROCESSED: 46177
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 361399968
INFO  : TaskCounter_Map_4_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 1348619833
INFO  :    OUTPUT_BYTES_PHYSICAL: 808977386
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 1443301721
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 47259215
INFO  :    SHUFFLE_CHUNK_COUNT: 27
INFO  :    SPILLED_RECORDS: 47259215
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 1747171072110
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 425288767549
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3494
INFO  :    LAST_EVENT_RECEIVED: 668391
INFO  :    MERGED_MAP_OUTPUTS: 71643145456
INFO  :    MERGE_PHASE_TIME: 25289522
INFO  :    NUM_DISK_TO_DISK_MERGES: 1116
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 180
INFO  :    NUM_SHUFFLED_INPUTS: 3378132
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 363318064
INFO  :    REDUCE_INPUT_RECORDS: 9570466334
INFO  :    SHUFFLE_BYTES: 1210237277829
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 2138373445444
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 207129855550
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 1003107422279
INFO  :    SHUFFLE_PHASE_TIME: 14385309
INFO  :    SPILLED_RECORDS: 11057638447
INFO  : TaskCounter_Reducer_2_INPUT_Map_4:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 718412569
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 577375857
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 3584
INFO  :    LAST_EVENT_RECEIVED: 7551
INFO  :    MERGED_MAP_OUTPUTS: 27243
INFO  :    MERGE_PHASE_TIME: 494028
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 27243
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 47259215
INFO  :    REDUCE_INPUT_RECORDS: 47259215
INFO  :    SHUFFLE_BYTES: 808977386
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 1443301721
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 141036712
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 667940674
INFO  :    SHUFFLE_PHASE_TIME: 419560
INFO  :    SPILLED_RECORDS: 47259215
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 10183848
INFO  :    OUTPUT_BYTES_PHYSICAL: 7771994
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 10193938
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1009
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO  :    FIRST_EVENT_RECEIVED: 13
INFO  :    INPUT_RECORDS_PROCESSED: 1009
INFO  :    LAST_EVENT_RECEIVED: 73153
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1009
INFO  :    SHUFFLE_BYTES: 7747778
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 10193938
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 995848
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 6751930
INFO  :    SHUFFLE_PHASE_TIME: 73155
INFO  : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 3348
INFO  :    GROUPED_INPUT_SPLITS_Map_4: 27
INFO  :    INPUT_DIRECTORIES_Map_1: 96
INFO  :    INPUT_DIRECTORIES_Map_4: 1
INFO  :    INPUT_FILES_Map_1: 5843
INFO  :    INPUT_FILES_Map_4: 27
INFO  :    RAW_INPUT_SPLITS_Map_1: 5843
INFO  :    RAW_INPUT_SPLITS_Map_4: 27

 

 

 

This is the query:

 

 

create table mydb.mytbl
as select  col1,col2,...col18 
from mydb.tbl1 a 
inner join mydb.tbl2 b on  concat(trim(col1),trim(col2))=colb where date_partition_col between  date_sub('2021-05-10',95) and '2021-05-10';

 

 

 

This is the explain plan:

 

 

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Map 1 <- Map 2 (BROADCAST_EDGE)                    |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Map 1 vectorized                             |
|       File Output Operator [FS_30]                 |
|         Select Operator [SEL_29] (rows=122690953 width=12844) |
|           Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
|           Map Join Operator [MAPJOIN_28] (rows=122690953 width=12844) |
|             Conds:SEL_27._col18=RS_25._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17"] |
|           <-Map 2 [BROADCAST_EDGE] vectorized      |
|             BROADCAST [RS_25]                      |
|               PartitionCols:_col0                  |
|               Select Operator [SEL_24] (rows=47259215 width=110) |
|                 Output:["_col0"]                   |
|                 Filter Operator [FIL_23] (rows=47259215 width=110) |
|                   predicate:colb is not null |
|                   TableScan [TS_3] (rows=47259215 width=110) |
|                     mydb@tbl2,b, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["colb"] |
|           <-Select Operator [SEL_27] (rows=111537228 width=12844) |
|               Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"] |
|               Filter Operator [FIL_26] (rows=111537228 width=12844) |
|                 predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') |
|                 TableScan [TS_0] (rows=1056668483 width=12844) |
|                   mydb@tbl1,a,Tbl:PARTIAL,Col:PARTIAL,Output:["col1","col2",...,"col18"] |

 

 

 

Can someone help me figure out the bottlenecks here?

 

Thanks,

Megh

9 REPLIES 9

avatar
Guru

I see the two large tables are scanned and it is done at Map 1 phase  and scanned around 9,569,213,817 records.

 

predicate:(concat(trim(col1), trim(col2)) is not null and CAST( date_partition_col AS DATE) BETWEEN DATE'2021-02-04' AND DATE'2021-05-10') ==> scanning entire large table

 

Can you try below one:

create table mydb.mytbl
as select col1,col2,...col18 where here date_partition_col between date_sub('2021-05-10',95) and '2021-05-10 from from mydb.tbl1 a
inner join mydb.tbl2 b on concat(trim(col1),trim(col2))=colb;

 

avatar
Community Manager

Hi @vidanimegh, Has @asish's reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

Hi @asish ,

 

Apologies for the late reply.

 

I tried the query you have suggested but still don't observe any difference in the runtime.

 

Thanks,

Megh

avatar
Guru

hi megh

 

Can you  share application logs and beeline trace .

 

Thaanks,

Asish

avatar
Expert Contributor

Hello @asish ,

 

Apologies for not replying earlier.

 

I recently thought that I'll compare a simple count(*) query performance on a single partition on my old cluster vs the current cluster and observe the difference I'm getting in the query counters.

 

What I discovered is that there is one GC_TIME_MILLIS counter which is significantly higher in the current cluster.

 

to give an example, this is the query I ran on both the clusters (Both clusters are having identical hardware and same amount of resources):

select count(*) from mydb.tbl1 where date_partition_col='2021-07-30';

 

Following is the trace from the old cluster:

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     64         64        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 79.65 s
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 79.64 seconds
INFO  :
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.23s
INFO  : Prepare Plan                            7.90s
INFO  : Get Query Coordinator (AM)              0.21s
INFO  : Submit Plan                             0.24s
INFO  : Start DAG                               0.96s
INFO  : Run DAG                                79.64s
INFO  : ----------------------------------------------------------------------------------------------
INFO  :
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1          73845.00      1,642,830         10,289     127,396,873               95
INFO  :  Reducer 2          31853.00          4,120             71              64                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  :
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 65
INFO  :    TOTAL_LAUNCHED_TASKS: 65
INFO  :    DATA_LOCAL_TASKS: 7
INFO  :    RACK_LOCAL_TASKS: 57
INFO  :    AM_CPU_MILLISECONDS: 14500
INFO  :    AM_GC_TIME_MILLIS: 0
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 300
INFO  :    FILE_BYTES_WRITTEN: 3840
INFO  :    HDFS_BYTES_READ: 39636387430
INFO  :    HDFS_BYTES_WRITTEN: 248
INFO  :    HDFS_READ_OPS: 130
INFO  :    HDFS_WRITE_OPS: 2
INFO  :    HDFS_OP_CREATE: 1
INFO  :    HDFS_OP_GET_FILE_STATUS: 3
INFO  :    HDFS_OP_OPEN: 127
INFO  :    HDFS_OP_RENAME: 1
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    SPILLED_RECORDS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 64
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    GC_TIME_MILLIS: 10360
INFO  :    TASK_DURATION_MILLIS: 1505522
INFO  :    CPU_MILLISECONDS: 1646950
INFO  :    PHYSICAL_MEMORY_BYTES: 139586437120
INFO  :    VIRTUAL_MEMORY_BYTES: 607444127744
INFO  :    COMMITTED_HEAP_BYTES: 139586437120
INFO  :    INPUT_RECORDS_PROCESSED: 127429957
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO  :    OUTPUT_RECORDS: 64
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 384
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO  :    OUTPUT_BYTES_PHYSICAL: 3328
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_BYTES: 1792
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 896
INFO  :    SHUFFLE_BYTES_TO_MEM: 1652
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 140
INFO  :    SHUFFLE_PHASE_TIME: 30751
INFO  :    FIRST_EVENT_RECEIVED: 89
INFO  :    LAST_EVENT_RECEIVED: 30749
INFO  : HIVE:
INFO  :    CREATED_FILES: 1
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 127396873
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 95
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_13: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_10: 64
INFO  :    RECORDS_OUT_OPERATOR_GBY_12: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_11: 95
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 127429893
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 127429893
INFO  : TaskCounter_Map_1_INPUT_urcs_transactions:
INFO  :    INPUT_RECORDS_PROCESSED: 127429893
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 384
INFO  :    OUTPUT_BYTES_PHYSICAL: 3328
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 64
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    FIRST_EVENT_RECEIVED: 89
INFO  :    INPUT_RECORDS_PROCESSED: 64
INFO  :    LAST_EVENT_RECEIVED: 30749
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 64
INFO  :    SHUFFLE_BYTES: 1792
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 896
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 140
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 1652
INFO  :    SHUFFLE_PHASE_TIME: 30751
INFO  : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 64
INFO  :    INPUT_DIRECTORIES_Map_1: 1
INFO  :    INPUT_FILES_Map_1: 127
INFO  :    RAW_INPUT_SPLITS_Map_1: 127
INFO  : Completed executing command(queryId=hive_20210802162145_c6dc5748-ef38-42fb-becd-46dd33d2dd16); Time taken: 88.954 seconds
INFO  : OK
+------------+
|    _c0     |
+------------+
| 127429893  |
+------------+
1 row selected (89.234 seconds)

 

Same query trace from the new cluster:

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     64         64        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 176.45 s
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 176.42 seconds
INFO  :
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.14s
INFO  : Prepare Plan                            0.04s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                             4.79s
INFO  : Start DAG                               0.04s
INFO  : Run DAG                               176.42s
INFO  : ----------------------------------------------------------------------------------------------
INFO  :
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1         173773.00      9,458,900        151,977     127,398,948               95
INFO  :  Reducer 2          53863.00          2,490              0              64                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  :
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 65
INFO  :    TOTAL_LAUNCHED_TASKS: 65
INFO  :    DATA_LOCAL_TASKS: 11
INFO  :    RACK_LOCAL_TASKS: 53
INFO  :    AM_CPU_MILLISECONDS: 22160
INFO  :    AM_GC_TIME_MILLIS: 0
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 480
INFO  :    FILE_BYTES_WRITTEN: 3840
INFO  :    HDFS_BYTES_READ: 39636387430
INFO  :    HDFS_BYTES_WRITTEN: 109
INFO  :    HDFS_READ_OPS: 129
INFO  :    HDFS_WRITE_OPS: 2
INFO  :    HDFS_OP_CREATE: 1
INFO  :    HDFS_OP_GET_FILE_STATUS: 2
INFO  :    HDFS_OP_OPEN: 127
INFO  :    HDFS_OP_RENAME: 1
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    SPILLED_RECORDS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 64
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    GC_TIME_MILLIS: 151977
INFO  :    TASK_DURATION_MILLIS: 4806095
INFO  :    CPU_MILLISECONDS: 9461390
INFO  :    PHYSICAL_MEMORY_BYTES: 318765006848
INFO  :    VIRTUAL_MEMORY_BYTES: 607499304960
INFO  :    COMMITTED_HEAP_BYTES: 318765006848
INFO  :    INPUT_RECORDS_PROCESSED: 127429957
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO  :    OUTPUT_RECORDS: 64
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 384
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO  :    OUTPUT_BYTES_PHYSICAL: 3328
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_BYTES: 1792
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 896
INFO  :    SHUFFLE_BYTES_TO_MEM: 1568
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 224
INFO  :    SHUFFLE_PHASE_TIME: 54159
INFO  :    FIRST_EVENT_RECEIVED: 92
INFO  :    LAST_EVENT_RECEIVED: 54149
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_FILES: 1
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 127398948
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 95
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_13: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_10: 64
INFO  :    RECORDS_OUT_OPERATOR_GBY_12: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_11: 95
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 127429893
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 127429893
INFO  : TaskCounter_Map_1_INPUT_urcs_transactions:
INFO  :    INPUT_RECORDS_PROCESSED: 127429893
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 39636387430
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 384
INFO  :    OUTPUT_BYTES_PHYSICAL: 3328
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 896
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 64
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    FIRST_EVENT_RECEIVED: 92
INFO  :    INPUT_RECORDS_PROCESSED: 64
INFO  :    LAST_EVENT_RECEIVED: 54149
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 64
INFO  :    SHUFFLE_BYTES: 1792
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 896
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 224
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 1568
INFO  :    SHUFFLE_PHASE_TIME: 54159
INFO  : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     64         64        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 176.45 s
----------------------------------------------------------------------------------------------
+------------+
|    _c0     |
+------------+
| 127429893  |
+------------+
1 row selected (181.488 seconds)

 

The only significant difference I see between these two is for GC_TIME_MILLIS.

 

I tried to tune multiple Hive and YARN config settings related to Heap Memory and GC but there is no difference observed.

 

Any ideas?

 

Thanks,

Megh

avatar
Guru
Hi Megh,

Thanks for the logs.

I found a discrepancy in HDP logs,as per "Query counters on HDP Cluster.txt" ,the Mapper spawned is 64 but from the application_1627033595344_12159_hdp.log,I observe it is 127.

        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     64         64        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------


grep "Event:TASK_STARTED]: vertexName=Map 1" syslog_dag_1627033595344_12159_1 | wc -l
     127
grep "Event:TASK_FINISHED]: vertexName=Map 1" syslog_dag_1627033595344_12159_1 | wc -l
     127

 Question: have you provided correct application log for HDP ? As I see a discrepancy.

Assuming you have provided correct set of logs for HDO,I compared between CDP and HDP,I see that input records for each maper is more in CDP then HDP that is the reason it took more tie in CDP.

I have collected samle taskID and INPUT_RECORDS_PROCESSED for each maper.

application_1627885418606_0016_cdp.log
======

each Mapper  took close to 1 min to finish

taskId=task_1627885418606_0016_1_00_000059 INPUT_RECORDS_PROCESSED=2012469
taskId=task_1627885418606_0016_1_00_000061 INPUT_RECORDS_PROCESSED=2010369

taskId=task_1627885418606_0016_1_00_000026 INPUT_RECORDS_PROCESSED=2021105


application_1627033595344_12159_hdp.log
==========

 taskId=task_1627033595344_12159_1_00_000062 INPUT_RECORDS_PROCESSED=1010250
  taskId=task_1627033595344_12159_1_00_000126 INPUT_RECORDS_PROCESSED=376062


Thanks,
Asish

avatar
Community Manager

@vidanimegh, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

Hi @VidyaSargur ,

 

We're having an open thread on Cloudera support case, where @asish and I are discussing this same issue.

 

As and when we get a resolution for this, We'll update this question and close.

 

Thanks,

Megh

avatar
Community Manager

@vidanimegh, Sure. Thanks for your response, will wait for your update. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: