Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Support uncorrelated subqueries in the WHERE clause

avatar
Explorer

 

Our Hive is Hive 1.1.0-cdh5.14.2. But still  uncorrelated subqueries in the WHERE clause is not working as per

https://issues.apache.org/jira/browse/HIVE-784. Can you please help? Thanks!

1 REPLY 1

avatar

Hi, I have tested the correlated and uncorrelated subqueries in Hive CDH 5.13 and you are right.

 

 

CORRELATED:

First the data and test in Impala:

[10.197.0.0:21000] > create table work.tbla ( id int, s string );
[10.197.0.0:21000] > create table work.tblb ( id int, s string );
[10.197.0.0:21000] > insert into work.tbla select 100, '2008-01-01' union select 200, '2009-12-01';
Modified 2 row(s) in 3.69s
[10.197.0.0:21000] > insert into work.tblb select 100, '2008-01-01' union select 200, '2009-12-01';
Modified 2 row(s) in 3.63s
[10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' );
+-----+------------+
| id  | s          |
+-----+------------+
| 200 | 2009-12-01 |
+-----+------------+

Now in Hive:

 

0: jdbc:hive2://xxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' );
..
INFO  : OK
+----------------------------------------------------+--+
|                      Explain                       |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES:                                |
|   Stage-4 is a root stage                          |
|   Stage-3 depends on stages: Stage-4               |
|   Stage-0 depends on stages: Stage-3               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-4                                   |
|     Map Reduce Local Work                          |
|       Alias -> Map Local Tables:                   |
|         sq_1:b                                     |
|           Fetch Operator                           |
|             limit: -1                              |
|       Alias -> Map Local Operator Tree:            |
|         sq_1:b                                     |
|           TableScan                                |
|             alias: b                               |
|             filterExpr: ((s > '2009-10-01') and id is not null) (type: boolean) |
|             Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|             Filter Operator                        |
|               predicate: ((s > '2009-10-01') and id is not null) (type: boolean) |
|               Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|               Select Operator                      |
|                 expressions: id (type: int)        |
|                 outputColumnNames: _col1           |
|                 Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|                 Group By Operator                  |
|                   keys: _col1 (type: int)          |
|                   mode: hash                       |
|                   outputColumnNames: _col0         |
|                   Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|                   HashTable Sink Operator          |
|                     keys:                          |
|                       0 id (type: int)             |
|                       1 _col0 (type: int)          |
|                                                    |
|   Stage: Stage-3                                   |
|     Map Reduce                                     |
|       Map Operator Tree:                           |
|           TableScan                                |
|             alias: a                               |
|             filterExpr: id is not null (type: boolean) |
|             Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|             Filter Operator                        |
|               predicate: id is not null (type: boolean) |
|               Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE |
|               Map Join Operator                    |
|                 condition map:                     |
|                      Left Semi Join 0 to 1         |
|                 keys:                              |
|                   0 id (type: int)                 |
|                   1 _col0 (type: int)              |
|                 outputColumnNames: _col0, _col1    |
|                 Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE |
|                 File Output Operator               |
|                   compressed: false                |
|                   Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE |
|                   table:                           |
|                       input format: org.apache.hadoop.mapred.TextInputFormat |
|                       output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                       serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       Local Work:                                  |
|         Map Reduce Local Work                      |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+--+
69 rows selected (0.311 seconds)

UNCORRELATED - Hive fails:

 

0: jdbc:hive2://xxxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where  b.s > '2009-10-01' );
Error: Error while compiling statement: FAILED: SemanticException Line 1:65 Invalid SubQuery expression ''2009-10-01'' in definition of SubQuery sq_1 [
exists ( select null from work.tblb b where  b.s > '2009-10-01' )
] used as sq_1 at Line 1:39: For Exists/Not Exists operator SubQuery must be Correlated. (state=42000,code=40000)

Impala ok

[10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where  b.s > '2009-10-01' );
+-----+------------+
| id  | s          |
+-----+------------+
| 100 | 2008-01-01 |
| 200 | 2009-12-01 |
+-----+------------+
Fetched 2 row(s) in 0.19s