Created on 04-11-2019 07:31 AM - edited 09-16-2022 07:18 AM
Hi friends,
I am working on optimizing the Impala Query performance, for that purpose I am using query options in JDBC URL.
After running the query I am trying to analyse the Plan, Text Plan, Summary and Profile.
(http://host:25000/query_plan?query_id=fsdfsf432a:997fre34ss000000)
But I don't which thing I should consider and how to interpret? I can see lot of terminologies but not able to understand any of them. I could not come to conclusion for performance. I don't which thing I need to modify after observing the Text Plan or Summary or Profile.
Which is more important Summary or Text Plan or Profile?
Could anyone please tell guide me how to understand/interpret below terminology-
Query -
*******************************************************
select ss_sold_time_sk, time_dim.t_time_sk, ss_hdemo_sk, household_demographics.hd_demo_sk, ss_store_sk, s_store_sk, t_hour, t_minute, household_demographics.hd_dep_count, store.s_store_name from store_sales ,household_demographics ,time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 8 and time_dim.t_minute >= 30 and household_demographics.hd_dep_count = 5 and store.s_store_name = 'ese' limit 10000;
*********************************************************************
Text Plan (Why there are indentation and numbers like 10:EXCHANGE [UNPARTITIONED] then 09:EXCHANGE [BROADCAST] )
----------------
Estimated Per-Host Requirements: Memory=384.15MB VCores=4
WARNING: The following tables are missing relevant table and/or column statistics.
tpcds_bin_partitioned_textfile_40.household_demographics, tpcds_bin_partitioned_textfile_40.time_dim
10:EXCHANGE [UNPARTITIONED]
| limit: 10000
| hosts=2 per-host-mem=unavailable
| tuple-ids=0,3,1,2 row-size=80B cardinality=10000
|
06:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_sold_time_sk = time_dim.t_time_sk
| runtime filters: RF000 <- time_dim.t_time_sk
| limit: 10000
| hosts=2 per-host-mem=148.50KB
| tuple-ids=0,3,1,2 row-size=80B cardinality=10000
|
|--09:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=2 row-size=16B cardinality=8640
| |
| 02:SCAN HDFS [tpcds_bin_partitioned_textfile_40.time_dim, RANDOM]
| partitions=1/1 files=1 size=4.79MB
| predicates: time_dim.t_hour = 8, time_dim.t_minute >= 30
| table stats: 86400 rows total
| column stats: unavailable
| hosts=1 per-host-mem=32.00MB
| tuple-ids=2 row-size=16B cardinality=8640
|
05:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_hdemo_sk = household_demographics.hd_demo_sk
| runtime filters: RF001 <- household_demographics.hd_demo_sk
| hosts=2 per-host-mem=9.28KB
| tuple-ids=0,3,1 row-size=64B cardinality=22033737
|
|--08:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=1 row-size=12B cardinality=720
| |
| 01:SCAN HDFS [tpcds_bin_partitioned_textfile_40.household_demographics, RANDOM]
| partitions=1/1 files=1 size=141.07KB
| predicates: household_demographics.hd_dep_count = 5
| table stats: 7200 rows total
| column stats: unavailable
| hosts=1 per-host-mem=32.00MB
| tuple-ids=1 row-size=12B cardinality=720
|
04:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: ss_store_sk = s_store_sk
| runtime filters: RF002 <- s_store_sk
| hosts=2 per-host-mem=339B
| tuple-ids=0,3 row-size=52B cardinality=22033737
|
|--07:EXCHANGE [BROADCAST]
| | hosts=1 per-host-mem=0B
| | tuple-ids=3 row-size=28B cardinality=11
| |
| 03:SCAN HDFS [tpcds_bin_partitioned_textfile_40.store, RANDOM]
| partitions=1/1 files=1 size=29.34KB
| predicates: store.s_store_name = 'ese'
| table stats: 112 rows total
| column stats: all
| hosts=1 per-host-mem=32.00MB
| tuple-ids=3 row-size=28B cardinality=11
|
00:SCAN HDFS [tpcds_bin_partitioned_textfile_40.store_sales, RANDOM]
partitions=1824/1824 files=1824 size=13.84GB
runtime filters: RF000 -> ss_sold_time_sk, RF001 -> ss_hdemo_sk, RF002 -> ss_store_sk
table stats: 115203420 rows total
column stats: all
hosts=2 per-host-mem=384.00MB
tuple-ids=0 row-size=24B cardinality=115203420
----------------
******************************************************************
Exec Summary (What to read and how to interpret from below sheet)
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------------- 10:EXCHANGE 1 1.492ms 1.492ms 10.00K 10.00K 0 -1.00 B UNPARTITIONED 06:HASH JOIN 0 0.000ns 0.000ns 0 10.00K 0 148.50 KB INNER JOIN, BROADCAST |--09:EXCHANGE 0 0.000ns 0.000ns 0 8.64K 0 0 BROADCAST | 02:SCAN HDFS 1 55.911ms 55.911ms 1.80K 8.64K 8.07 MB 32.00 MB tpcds_bin_partitioned_textf... 05:HASH JOIN 0 0.000ns 0.000ns 0 22.03M 0 9.28 KB INNER JOIN, BROADCAST |--08:EXCHANGE 0 0.000ns 0.000ns 0 720 0 0 BROADCAST | 01:SCAN HDFS 1 32.422ms 32.422ms 720 720 284.00 KB 32.00 MB tpcds_bin_partitioned_textf... 04:HASH JOIN 0 0.000ns 0.000ns 0 22.03M 0 339.00 B INNER JOIN, BROADCAST |--07:EXCHANGE 0 0.000ns 0.000ns 0 11 0 0 BROADCAST | 03:SCAN HDFS 1 50.174ms 50.174ms 15 11 108.00 KB 32.00 MB tpcds_bin_partitioned_textf... 00:SCAN HDFS 0 0.000ns 0.000ns 0 115.20M 0 384.00 MB tpcds_bin_partitioned_textf...
************************************************
Profile is very big output therefore not posting here.
Plese help me understanding above data. So far I am just reading articles but not able to come to concrete conclusion, everything is just imagination.
Created 04-19-2019 02:30 PM