Reply
New Contributor
Posts: 2
Registered: ‎05-30-2018

IN Clause Leads to Slower Performance

Hi All,

 

We are facing a scenario where are applying a filter condition of multiple values in the Where Clause.

 

Ex: Table Structure : User,TimeStamp,Value

Primary Key : User,Timestamp

Partition by User Hash Technique.

 

Query 1 : Select Max(Timestamp) from Tablea Where User='User1';

--output in 0.35 Seconds

 

Query 2: Select Max(Timestamp) from Tablea Where User in ('User1'); 

--Output in 44.4 Seconds

 

 

 

Any direction would really help

Cloudera Employee
Posts: 19
Registered: ‎09-28-2015

Re: IN Clause Leads to Slower Performance

[ Edited ]

Hi @Alpha_Sunil, that's definitely a surprising result.  An equality predicate and a single-value IN clause should be executed in exactly the same way.  Are you sure that Query 2 which takes 44 seconds only has a single value?  If so we should be able to isolate the issue if you attach a query profile (see these docs for more info).

 

Note that when applied to the PK coclumn, multi-value IN clauses can take significantly longer than multiple queries using equality predicates because Kudu hasn't yet implemented an important optimization, this is tracked in KUDU-1645.

New Contributor
Posts: 2
Registered: ‎05-30-2018

Re: IN Clause Leads to Slower Performance

Query Output.jpgHi Dan,

 

Thats the extact behaviour been observed. Attached the outputs of the query. Couple of values masked- sensitive info.

 

 

 

 

 

Please find the query plan as well. The query plan is exactly the same.

 

Query: explain select well_num,max(timepoint) from timedata where well_num=ASCJSVSA group by well_num
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=256.00MB VCores=2                          |
|                                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 04:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 03:AGGREGATE [FINALIZE]                                                            |
| |  output: max:merge(timepoint)                                                    |
| |  group by: well_num                                                              |
| |                                                                                  |
| 02:EXCHANGE [HASH(well_num)]                                                       |
| |                                                                                  |
| 01:AGGREGATE [STREAMING]                                                           |
| |  output: max(timepoint)                                                          |
| |  group by: well_num                                                              |
| |                                                                                  |
| 00:SCAN KUDU [timedata]                                                            |
|    kudu predicates: well_num = ASCJSVSA                                          |
+------------------------------------------------------------------------------------+

 

Query: explain select well_num,max(timepoint) from timedata where well_num in (ASCJSVSA) group by well_num
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=256.00MB VCores=2                          |
|                |
|                                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 04:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 03:AGGREGATE [FINALIZE]                                                            |
| |  output: max:merge(timepoint)                                                    |
| |  group by: well_num                                                              |
| |                                                                                  |
| 02:EXCHANGE [HASH(well_num)]                                                       |
| |                                                                                  |
| 01:AGGREGATE [STREAMING]                                                           |
| |  output: max(timepoint)                                                          |
| |  group by: well_num                                                              |
| |                                                                                  |
| 00:SCAN KUDU [timedata]                                                            |
|    predicates: well_num IN (ASCJSVSA)                                            |
+------------------------------------------------------------------------------------+

Highlighted
Cloudera Employee
Posts: 19
Registered: ‎09-28-2015

Re: IN Clause Leads to Slower Performance

Hi @Alpha_Sunil, sorry for the slow response.  The query results and the explain aren't exactly the same in the predicate they are applying.  It looks like the queries are providing an integer value, while the EXPLAIN is using 'ASCJSVSA'.  Please attach the 'SHOW CREATE TABLE <table>;' output so I can understand what the column types, primary key, and partitioning that are applied to this table.  Additionally, what does the EXPLAIN output look like when using the 15242... value?  I'm thinking that perhaps Impala is not able to push down the predicate in some case due to casting/type issues.

Announcements