Support Questions

Find answers, ask questions, and share your expertise

Support uncorrelated subqueries in the WHERE clause

avatar
Explorer

 

Our Hive is Hive 1.1.0-cdh5.14.2. But still  uncorrelated subqueries in the WHERE clause is not working as per

https://issues.apache.org/jira/browse/HIVE-784. Can you please help? Thanks!

1 REPLY 1

avatar

Hi, I have tested the correlated and uncorrelated subqueries in Hive CDH 5.13 and you are right.

 

 

CORRELATED:

First the data and test in Impala:

[10.197.0.0:21000] > create table work.tbla ( id int, s string );
[10.197.0.0:21000] > create table work.tblb ( id int, s string );
[10.197.0.0:21000] > insert into work.tbla select 100, '2008-01-01' union select 200, '2009-12-01';
Modified 2 row(s) in 3.69s
[10.197.0.0:21000] > insert into work.tblb select 100, '2008-01-01' union select 200, '2009-12-01';
Modified 2 row(s) in 3.63s
[10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' );
+-----+------------+
| id  | s          |
+-----+------------+
| 200 | 2009-12-01 |
+-----+------------+

Now in Hive:

 

0: jdbc:hive2://xxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' );
..
INFO  : OK
+----------------------------------------------------+--+
|                      Explain                       |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES:                                |
|   Stage-4 is a root stage                          |
|   Stage-3 depends on stages: Stage-4               |
|   Stage-0 depends on stages: Stage-3               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-4                                   |
|     Map Reduce Local Work                          |
|       Alias -> Map Local Tables:                   |
|         sq_1:b                                     |
|           Fetch Operator                           |
|             limit: -1                              |
|       Alias -> Map Local Operator Tree:            |
|         sq_1:b                                     |
|           TableScan                                |
|             alias: b                               |
|             filterExpr: ((s > '2009-10-01') and id is not null) (type: boolean) |
|             Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|             Filter Operator                        |
|               predicate: ((s > '2009-10-01') and id is not null) (type: boolean) |
|               Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|               Select Operator                      |
|                 expressions: id (type: int)        |
|                 outputColumnNames: _col1           |
|                 Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|                 Group By Operator                  |
|                   keys: _col1 (type: int)          |
|                   mode: hash                       |
|                   outputColumnNames: _col0         |
|                   Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|                   HashTable Sink Operator          |
|                     keys:                          |
|                       0 id (type: int)             |
|                       1 _col0 (type: int)          |
|                                                    |
|   Stage: Stage-3                                   |
|     Map Reduce                                     |
|       Map Operator Tree:                           |
|           TableScan                                |
|             alias: a                               |
|             filterExpr: id is not null (type: boolean) |
|             Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|             Filter Operator                        |
|               predicate: id is not null (type: boolean) |
|               Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|               Map Join Operator                    |
|                 condition map:                     |
|                      Left Semi Join 0 to 1         |
|                 keys:                              |
|                   0 id (type: int)                 |
|                   1 _col0 (type: int)              |
|                 outputColumnNames: _col0, _col1    |
|                 Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE |
|                 File Output Operator               |
|                   compressed: false                |
|                   Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE |
|                   table:                           |
|                       input format: org.apache.hadoop.mapred.TextInputFormat |
|                       output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                       serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       Local Work:                                  |
|         Map Reduce Local Work                      |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+--+
69 rows selected (0.311 seconds)

UNCORRELATED - Hive fails:

 

0: jdbc:hive2://xxxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where  b.s > '2009-10-01' );
Error: Error while compiling statement: FAILED: SemanticException Line 1:65 Invalid SubQuery expression ''2009-10-01'' in definition of SubQuery sq_1 [
exists ( select null from work.tblb b where  b.s > '2009-10-01' )
] used as sq_1 at Line 1:39: For Exists/Not Exists operator SubQuery must be Correlated. (state=42000,code=40000)

Impala ok

[10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where  b.s > '2009-10-01' );
+-----+------------+
| id  | s          |
+-----+------------+
| 100 | 2008-01-01 |
| 200 | 2009-12-01 |
+-----+------------+
Fetched 2 row(s) in 0.19s