Support Questions
Find answers, ask questions, and share your expertise

Impala query thrift encoding missing some fields

New Contributor

Hi,

I am running Cloudera 6.3.1 and Impala Version -> impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)

When I look at the thrift query profiles, I don't see the anything other than state field being filled in the exec_summary in the below struct:

 

 

struct TRuntimeProfileTree {
  1: required list<TRuntimeProfileNode> nodes
  2: optional ExecStats.TExecSummary exec_summary
}

 

 

Only the state field in exec_summary is filled and nothing other than that.

Is there some sort of flags to be enabled on the Impala daemon side for this?

8 REPLIES 8

The exec summary isn't always going to be valid. It's only relevant if there was some execution that happened like in SELECT queries or DML. It won't be there for DDL. It's also only updated at certain points in the query running, so may not be present or up to date if the query hits an error or finishes early.

New Contributor

I am checking for a completed query which is a select statement and I see that the Query Summary (which has the execSummary string) is filled, but not the above exec_summary struct.

I'm not sure exactly what is going on there then, we could always investigate if we had an example. 

 

But I'd expect that using the full profile tree for information about query status, etc is more robust since it's kept up to date throughout the query. The exec_summary is a more recent add-on to the profile and is updated in a different way.

New Contributor

OK. Are there any impala options to be set in order to get the full profile tree? I mean does the profile creation always create all the fields unconditionally?

Yeah this isn't configurable.

New Contributor

So, would Impala by default generate the full profile tree (with all the fields)?

New Contributor

@Tim Armstrong I actually verified by running the bin/parse-thrift-profile.py on the profile that was generated by Impala 3.2 and I don't see the exec_summary (the top level struct) being filled, where as the string ExecSummary is filled perfectly:

 

 

...
...
ExecSummary': '\nOperator                 #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                                                        \n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\nF06:ROOT                      1    0.000ns    0.000ns                               0              0                                                                \n17:MERGING-EXCHANGE           1    0.000ns    0.000ns      100         100   16.00 KB       16.00 KB  UNPARTITIONED                                                 \nF04:EXCHANGE SENDER           1    0.000ns    0.000ns                         2.09 KB              0                                                                \n10:TOP-N                      1    1.000ms    1.000ms      100         100   60.00 KB        5.66 KB                                                                \n09:HASH JOIN                  1    1.000ms    1.000ms   26.29K       5.76M   12.05 MB        8.50 MB  INNER JOIN, BROADCAST                                         \n|--16:EXCHANGE                1    0.000ns    0.000ns  144.00K     144.00K    9.88 MB        4.15 MB  BROADCAST                                                     \n|  F05:EXCHANGE SENDER        1   36.000ms   36.000ms                         3.88 KB              0                                                                \n|  08:SCAN HDFS               1  135.001ms  135.001ms  144.00K     144.00K   11.56 MB       48.00 MB  tpcds_bin_partitioned_orc_2.customer                          \n15:AGGREGATE                  1    8.000ms    8.000ms   26.29K       5.76M   34.05 MB      128.00 MB  FINALIZE                                                      \n14:EXCHANGE                   1    0.000ns    0.000ns   26.29K       5.76M  200.00 KB       10.05 MB  HASH(ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) \nF00:EXCHANGE SENDER           1    8.000ms    8.000ms                         2.41 KB              0                                                                \n07:AGGREGATE                  1   22.000ms   22.000ms   26.29K       5.76M   34.34 MB      128.00 MB  STREAMING                                                     \n06:HASH JOIN                  1   14.000ms   14.000ms  280.27K       5.76M    2.18 MB        1.94 MB  INNER JOIN, BROADCAST                                         \n|--13:EXCHANGE                1    0.000ns    0.000ns    5.04K         720  184.00 KB       25.31 KB  BROADCAST                                                     \n|  F03:EXCHANGE SENDER        1    1.000ms    1.000ms                         7.52 KB              0                                                                \n|  03:SCAN HDFS               1   24.000ms   24.000ms    5.04K         720  877.14 KB       48.00 MB  tpcds_bin_partitioned_orc_2.household_demographics            \n05:HASH JOIN                  1    6.000ms    6.000ms  280.27K       5.76M    2.11 MB        1.94 MB  INNER JOIN, BROADCAST                                         \n|--12:EXCHANGE                1    0.000ns    0.000ns       21           2   16.00 KB       16.00 KB  BROADCAST                                                     \n|  F02:EXCHANGE SENDER        1    0.000ns    0.000ns                         5.12 KB              0                                                                \n|  02:SCAN HDFS               1   31.000ms   31.000ms       21           2  712.07 KB       48.00 MB  tpcds_bin_partitioned_orc_2.store                             \n04:HASH JOIN                  1    9.000ms    9.000ms  280.27K       5.76M    2.03 MB        1.94 MB  INNER JOIN, BROADCAST                                         \n|--11:EXCHANGE                1    0.000ns    0.000ns      156       7.30K   16.00 KB      134.14 KB  BROADCAST                                                     \n|  F01:EXCHANGE SENDER        1    0.000ns    0.000ns                         7.52 KB              0                                                                \n|  01:SCAN HDFS               1   41.000ms   41.000ms      156       7.30K  969.86 KB       48.00 MB  tpcds_bin_partitioned_orc_2.date_dim                          \n00:SCAN HDFS                  1    1s109ms    1s109ms  280.27K       5.76M   34.90 MB       64.00 MB  tpcds_bin_partitioned_orc_2.store_sales'
...
...
exec_summary=TExecSummary(status=None, is_queued=None, queued_reason=None, error_logs=None, state=0, progress=None, nodes=None, exch_to_sender_map=None

 

 

@hsri it seems like this would merit some more investigation - this was added as a nicety a little while back but it may not be working as expected. If you can reproduce this with a simple query, could you file a bug on Apache Impala? https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

; ;