Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Calculation of "constants"

avatar
Explorer

I've read query plans in SQL Server forever using a GUI, but text plans are like Greek to me (I don't speak Greek).  I'm trying to figure out if statements like this calculate the date "constant" for each row and cause performance lag, or if Impala knows to calculate it once and apply to all rows.

 

select *
from myTable
-- the "months_add" is basically a constant, but I can see 
-- why a function might be calculated for each row
where tran_timestamp > months_add(now(), -1)
limit 1;

select *
from myTable
-- but what about this?  this is definitely a constant
-- so is this different than the query above and is
-- the "constant" calculated for each row?
where tran_timestamp > months_add('2021-01-01', -1)
limit 1




1 ACCEPTED SOLUTION

avatar
Expert Contributor

The calculations on constants should be processed in the compiling phase. If you have a look at the query plans as below, you can see that both "months_add(now(), -1)" and "months_add('2020-01-01', -1)" are evaluated already. Impala doesn't need to repeat evaluation for each row.

[localhost:21000] default> explain select * from mytable where tran_timestamp > months_add(now(), -1);
Query: explain select * from mytable where tran_timestamp > months_add(now(), -1)
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=2                             |
| Per-Host Resource Estimates: Memory=10MB                                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.mytable                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.mytable]                                                     |
|    HDFS partitions=1/1 files=0 size=0B                                             |
|    predicates: tran_timestamp > TIMESTAMP '2021-06-30 17:08:33.432148000'          |
|    row-size=16B cardinality=0                                                      |
+------------------------------------------------------------------------------------+
Fetched 13 row(s) in 0.99s
[localhost:21000] default> explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1);
Query: explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1)
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=2                             |
| Per-Host Resource Estimates: Memory=10MB                                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.mytable                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.mytable]                                                     |
|    HDFS partitions=1/1 files=0 size=0B                                             |
|    predicates: tran_timestamp > TIMESTAMP '2019-12-01 00:00:00'                    |
|    row-size=16B cardinality=0                                                      |
+------------------------------------------------------------------------------------+
Fetched 13 row(s) in 0.02s

 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

The calculations on constants should be processed in the compiling phase. If you have a look at the query plans as below, you can see that both "months_add(now(), -1)" and "months_add('2020-01-01', -1)" are evaluated already. Impala doesn't need to repeat evaluation for each row.

[localhost:21000] default> explain select * from mytable where tran_timestamp > months_add(now(), -1);
Query: explain select * from mytable where tran_timestamp > months_add(now(), -1)
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=2                             |
| Per-Host Resource Estimates: Memory=10MB                                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.mytable                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.mytable]                                                     |
|    HDFS partitions=1/1 files=0 size=0B                                             |
|    predicates: tran_timestamp > TIMESTAMP '2021-06-30 17:08:33.432148000'          |
|    row-size=16B cardinality=0                                                      |
+------------------------------------------------------------------------------------+
Fetched 13 row(s) in 0.99s
[localhost:21000] default> explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1);
Query: explain select * from mytable where tran_timestamp > months_add('2020-01-01', -1)
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=2                             |
| Per-Host Resource Estimates: Memory=10MB                                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.mytable                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.mytable]                                                     |
|    HDFS partitions=1/1 files=0 size=0B                                             |
|    predicates: tran_timestamp > TIMESTAMP '2019-12-01 00:00:00'                    |
|    row-size=16B cardinality=0                                                      |
+------------------------------------------------------------------------------------+
Fetched 13 row(s) in 0.02s

 

avatar
Explorer

EXACTLY what I needed.  Thank you @robbiez !