Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

select not in not working

avatar
New Contributor

Hi all,

 

I am trying to get a list of IDs from a source while excluding a few select subjects.

 

so here are the queries:

 

select count(distinct idvar) from db1.source;

144252637

 

select count(distinct idvar) from db1.source where idvar in (select distinct idvar from db2.list);

 

710610

 

select count(distinct idvar) from db1.source where idvar not in (select distinct idvar from db2.list);

0

Please excuse the formatting in this message.  Something is not adding up here.  I would like some help please.

 

Thanks!

1 ACCEPTED SOLUTION

avatar

Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL.

 

[localhost:21000] > select count(distinct int_col) from functional.alltypestiny;
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+

I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive.

 

[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+

View solution in original post

1 REPLY 1

avatar

Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL.

 

[localhost:21000] > select count(distinct int_col) from functional.alltypestiny;
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+

I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive.

 

[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] > select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+