Created on 09-19-2018 10:57 AM - edited 09-16-2022 06:43 AM
Our Hive is Hive 1.1.0-cdh5.14.2. But still uncorrelated subqueries in the WHERE clause is not working as per
https://issues.apache.org/jira/browse/HIVE-784. Can you please help? Thanks!
Created 09-20-2018 12:02 PM
Hi, I have tested the correlated and uncorrelated subqueries in Hive CDH 5.13 and you are right.
CORRELATED:
First the data and test in Impala:
[10.197.0.0:21000] > create table work.tbla ( id int, s string ); [10.197.0.0:21000] > create table work.tblb ( id int, s string ); [10.197.0.0:21000] > insert into work.tbla select 100, '2008-01-01' union select 200, '2009-12-01'; Modified 2 row(s) in 3.69s [10.197.0.0:21000] > insert into work.tblb select 100, '2008-01-01' union select 200, '2009-12-01'; Modified 2 row(s) in 3.63s [10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' ); +-----+------------+ | id | s | +-----+------------+ | 200 | 2009-12-01 | +-----+------------+
Now in Hive:
0: jdbc:hive2://xxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where a.id = b.id and b.s > '2009-10-01' ); .. INFO : OK +----------------------------------------------------+--+ | Explain | +----------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-4 is a root stage | | Stage-3 depends on stages: Stage-4 | | Stage-0 depends on stages: Stage-3 | | | | STAGE PLANS: | | Stage: Stage-4 | | Map Reduce Local Work | | Alias -> Map Local Tables: | | sq_1:b | | Fetch Operator | | limit: -1 | | Alias -> Map Local Operator Tree: | | sq_1:b | | TableScan | | alias: b | | filterExpr: ((s > '2009-10-01') and id is not null) (type: boolean) | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: ((s > '2009-10-01') and id is not null) (type: boolean) | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: id (type: int) | | outputColumnNames: _col1 | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | Group By Operator | | keys: _col1 (type: int) | | mode: hash | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | HashTable Sink Operator | | keys: | | 0 id (type: int) | | 1 _col0 (type: int) | | | | Stage: Stage-3 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: a | | filterExpr: id is not null (type: boolean) | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: id is not null (type: boolean) | | Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column stats: NONE | | Map Join Operator | | condition map: | | Left Semi Join 0 to 1 | | keys: | | 0 id (type: int) | | 1 _col0 (type: int) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 33 Basic stats: COMPLETE Column stats: NONE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | Local Work: | | Map Reduce Local Work | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +----------------------------------------------------+--+ 69 rows selected (0.311 seconds)
UNCORRELATED - Hive fails:
0: jdbc:hive2://xxxx.com> explain select * from work.tbla a where exists ( select null from work.tblb b where b.s > '2009-10-01' ); Error: Error while compiling statement: FAILED: SemanticException Line 1:65 Invalid SubQuery expression ''2009-10-01'' in definition of SubQuery sq_1 [ exists ( select null from work.tblb b where b.s > '2009-10-01' ) ] used as sq_1 at Line 1:39: For Exists/Not Exists operator SubQuery must be Correlated. (state=42000,code=40000)
Impala ok
[10.197.0.0:21000] > select * from work.tbla a where exists ( select null from work.tblb b where b.s > '2009-10-01' ); +-----+------------+ | id | s | +-----+------------+ | 100 | 2008-01-01 | | 200 | 2009-12-01 | +-----+------------+ Fetched 2 row(s) in 0.19s