- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Impala query thrift encoding missing some fields
Created on ‎07-15-2020 11:56 AM - edited ‎09-16-2022 07:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am running Cloudera 6.3.1 and Impala Version -> impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)
When I look at the thrift query profiles, I don't see the anything other than state field being filled in the exec_summary in the below struct:
struct TRuntimeProfileTree {
1: required list<TRuntimeProfileNode> nodes
2: optional ExecStats.TExecSummary exec_summary
}
Only the state field in exec_summary is filled and nothing other than that.
Is there some sort of flags to be enabled on the Impala daemon side for this?
Created ‎07-15-2020 12:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The exec summary isn't always going to be valid. It's only relevant if there was some execution that happened like in SELECT queries or DML. It won't be there for DDL. It's also only updated at certain points in the query running, so may not be present or up to date if the query hits an error or finishes early.
Created ‎07-15-2020 12:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am checking for a completed query which is a select statement and I see that the Query Summary (which has the execSummary string) is filled, but not the above exec_summary struct.
Created ‎07-15-2020 01:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure exactly what is going on there then, we could always investigate if we had an example.
But I'd expect that using the full profile tree for information about query status, etc is more robust since it's kept up to date throughout the query. The exec_summary is a more recent add-on to the profile and is updated in a different way.
Created ‎07-15-2020 06:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Are there any impala options to be set in order to get the full profile tree? I mean does the profile creation always create all the fields unconditionally?
Created ‎07-15-2020 06:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah this isn't configurable.
Created ‎07-15-2020 06:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, would Impala by default generate the full profile tree (with all the fields)?
Created ‎07-16-2020 02:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Tim Armstrong I actually verified by running the bin/parse-thrift-profile.py on the profile that was generated by Impala 3.2 and I don't see the exec_summary (the top level struct) being filled, where as the string ExecSummary is filled perfectly:
...
...
ExecSummary': '\nOperator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail \n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\nF06:ROOT 1 0.000ns 0.000ns 0 0 \n17:MERGING-EXCHANGE 1 0.000ns 0.000ns 100 100 16.00 KB 16.00 KB UNPARTITIONED \nF04:EXCHANGE SENDER 1 0.000ns 0.000ns 2.09 KB 0 \n10:TOP-N 1 1.000ms 1.000ms 100 100 60.00 KB 5.66 KB \n09:HASH JOIN 1 1.000ms 1.000ms 26.29K 5.76M 12.05 MB 8.50 MB INNER JOIN, BROADCAST \n|--16:EXCHANGE 1 0.000ns 0.000ns 144.00K 144.00K 9.88 MB 4.15 MB BROADCAST \n| F05:EXCHANGE SENDER 1 36.000ms 36.000ms 3.88 KB 0 \n| 08:SCAN HDFS 1 135.001ms 135.001ms 144.00K 144.00K 11.56 MB 48.00 MB tpcds_bin_partitioned_orc_2.customer \n15:AGGREGATE 1 8.000ms 8.000ms 26.29K 5.76M 34.05 MB 128.00 MB FINALIZE \n14:EXCHANGE 1 0.000ns 0.000ns 26.29K 5.76M 200.00 KB 10.05 MB HASH(ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) \nF00:EXCHANGE SENDER 1 8.000ms 8.000ms 2.41 KB 0 \n07:AGGREGATE 1 22.000ms 22.000ms 26.29K 5.76M 34.34 MB 128.00 MB STREAMING \n06:HASH JOIN 1 14.000ms 14.000ms 280.27K 5.76M 2.18 MB 1.94 MB INNER JOIN, BROADCAST \n|--13:EXCHANGE 1 0.000ns 0.000ns 5.04K 720 184.00 KB 25.31 KB BROADCAST \n| F03:EXCHANGE SENDER 1 1.000ms 1.000ms 7.52 KB 0 \n| 03:SCAN HDFS 1 24.000ms 24.000ms 5.04K 720 877.14 KB 48.00 MB tpcds_bin_partitioned_orc_2.household_demographics \n05:HASH JOIN 1 6.000ms 6.000ms 280.27K 5.76M 2.11 MB 1.94 MB INNER JOIN, BROADCAST \n|--12:EXCHANGE 1 0.000ns 0.000ns 21 2 16.00 KB 16.00 KB BROADCAST \n| F02:EXCHANGE SENDER 1 0.000ns 0.000ns 5.12 KB 0 \n| 02:SCAN HDFS 1 31.000ms 31.000ms 21 2 712.07 KB 48.00 MB tpcds_bin_partitioned_orc_2.store \n04:HASH JOIN 1 9.000ms 9.000ms 280.27K 5.76M 2.03 MB 1.94 MB INNER JOIN, BROADCAST \n|--11:EXCHANGE 1 0.000ns 0.000ns 156 7.30K 16.00 KB 134.14 KB BROADCAST \n| F01:EXCHANGE SENDER 1 0.000ns 0.000ns 7.52 KB 0 \n| 01:SCAN HDFS 1 41.000ms 41.000ms 156 7.30K 969.86 KB 48.00 MB tpcds_bin_partitioned_orc_2.date_dim \n00:SCAN HDFS 1 1s109ms 1s109ms 280.27K 5.76M 34.90 MB 64.00 MB tpcds_bin_partitioned_orc_2.store_sales'
...
...
exec_summary=TExecSummary(status=None, is_queued=None, queued_reason=None, error_logs=None, state=0, progress=None, nodes=None, exch_to_sender_map=None
Created ‎07-21-2020 09:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@hsri it seems like this would merit some more investigation - this was added as a nicety a little while back but it may not be working as expected. If you can reproduce this with a simple query, could you file a bug on Apache Impala? https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
