Created on 07-15-2020 11:56 AM - edited 09-16-2022 07:38 AM
Hi,
I am running Cloudera 6.3.1 and Impala Version -> impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)
When I look at the thrift query profiles, I don't see the anything other than state field being filled in the exec_summary in the below struct:
struct TRuntimeProfileTree {
1: required list<TRuntimeProfileNode> nodes
2: optional ExecStats.TExecSummary exec_summary
}
Only the state field in exec_summary is filled and nothing other than that.
Is there some sort of flags to be enabled on the Impala daemon side for this?
Created 07-15-2020 12:14 PM
The exec summary isn't always going to be valid. It's only relevant if there was some execution that happened like in SELECT queries or DML. It won't be there for DDL. It's also only updated at certain points in the query running, so may not be present or up to date if the query hits an error or finishes early.
Created 07-15-2020 12:19 PM
I am checking for a completed query which is a select statement and I see that the Query Summary (which has the execSummary string) is filled, but not the above exec_summary struct.
Created 07-15-2020 01:58 PM
I'm not sure exactly what is going on there then, we could always investigate if we had an example.
But I'd expect that using the full profile tree for information about query status, etc is more robust since it's kept up to date throughout the query. The exec_summary is a more recent add-on to the profile and is updated in a different way.
Created 07-15-2020 06:32 PM
OK. Are there any impala options to be set in order to get the full profile tree? I mean does the profile creation always create all the fields unconditionally?
Created 07-15-2020 06:56 PM
Yeah this isn't configurable.
Created 07-15-2020 06:59 PM
So, would Impala by default generate the full profile tree (with all the fields)?
Created 07-16-2020 02:19 PM
@Tim Armstrong I actually verified by running the bin/parse-thrift-profile.py on the profile that was generated by Impala 3.2 and I don't see the exec_summary (the top level struct) being filled, where as the string ExecSummary is filled perfectly:
...
...
ExecSummary': '\nOperator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail \n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\nF06:ROOT 1 0.000ns 0.000ns 0 0 \n17:MERGING-EXCHANGE 1 0.000ns 0.000ns 100 100 16.00 KB 16.00 KB UNPARTITIONED \nF04:EXCHANGE SENDER 1 0.000ns 0.000ns 2.09 KB 0 \n10:TOP-N 1 1.000ms 1.000ms 100 100 60.00 KB 5.66 KB \n09:HASH JOIN 1 1.000ms 1.000ms 26.29K 5.76M 12.05 MB 8.50 MB INNER JOIN, BROADCAST \n|--16:EXCHANGE 1 0.000ns 0.000ns 144.00K 144.00K 9.88 MB 4.15 MB BROADCAST \n| F05:EXCHANGE SENDER 1 36.000ms 36.000ms 3.88 KB 0 \n| 08:SCAN HDFS 1 135.001ms 135.001ms 144.00K 144.00K 11.56 MB 48.00 MB tpcds_bin_partitioned_orc_2.customer \n15:AGGREGATE 1 8.000ms 8.000ms 26.29K 5.76M 34.05 MB 128.00 MB FINALIZE \n14:EXCHANGE 1 0.000ns 0.000ns 26.29K 5.76M 200.00 KB 10.05 MB HASH(ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) \nF00:EXCHANGE SENDER 1 8.000ms 8.000ms 2.41 KB 0 \n07:AGGREGATE 1 22.000ms 22.000ms 26.29K 5.76M 34.34 MB 128.00 MB STREAMING \n06:HASH JOIN 1 14.000ms 14.000ms 280.27K 5.76M 2.18 MB 1.94 MB INNER JOIN, BROADCAST \n|--13:EXCHANGE 1 0.000ns 0.000ns 5.04K 720 184.00 KB 25.31 KB BROADCAST \n| F03:EXCHANGE SENDER 1 1.000ms 1.000ms 7.52 KB 0 \n| 03:SCAN HDFS 1 24.000ms 24.000ms 5.04K 720 877.14 KB 48.00 MB tpcds_bin_partitioned_orc_2.household_demographics \n05:HASH JOIN 1 6.000ms 6.000ms 280.27K 5.76M 2.11 MB 1.94 MB INNER JOIN, BROADCAST \n|--12:EXCHANGE 1 0.000ns 0.000ns 21 2 16.00 KB 16.00 KB BROADCAST \n| F02:EXCHANGE SENDER 1 0.000ns 0.000ns 5.12 KB 0 \n| 02:SCAN HDFS 1 31.000ms 31.000ms 21 2 712.07 KB 48.00 MB tpcds_bin_partitioned_orc_2.store \n04:HASH JOIN 1 9.000ms 9.000ms 280.27K 5.76M 2.03 MB 1.94 MB INNER JOIN, BROADCAST \n|--11:EXCHANGE 1 0.000ns 0.000ns 156 7.30K 16.00 KB 134.14 KB BROADCAST \n| F01:EXCHANGE SENDER 1 0.000ns 0.000ns 7.52 KB 0 \n| 01:SCAN HDFS 1 41.000ms 41.000ms 156 7.30K 969.86 KB 48.00 MB tpcds_bin_partitioned_orc_2.date_dim \n00:SCAN HDFS 1 1s109ms 1s109ms 280.27K 5.76M 34.90 MB 64.00 MB tpcds_bin_partitioned_orc_2.store_sales'
...
...
exec_summary=TExecSummary(status=None, is_queued=None, queued_reason=None, error_logs=None, state=0, progress=None, nodes=None, exch_to_sender_map=None
Created 07-21-2020 09:36 AM
@hsri it seems like this would merit some more investigation - this was added as a nicety a little while back but it may not be working as expected. If you can reproduce this with a simple query, could you file a bug on Apache Impala? https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala