<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: select not in not working in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/select-not-in-not-working/m-p/65784#M13555</link>
    <description>&lt;P&gt;Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny;
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+&lt;/PRE&gt;&lt;P&gt;I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+&lt;/PRE&gt;</description>
    <pubDate>Mon, 26 Mar 2018 21:47:35 GMT</pubDate>
    <dc:creator>Tim Armstrong</dc:creator>
    <dc:date>2018-03-26T21:47:35Z</dc:date>
    <item>
      <title>select not in not working</title>
      <link>https://community.cloudera.com/t5/Support-Questions/select-not-in-not-working/m-p/65783#M13554</link>
      <description>&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;Hi all,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;I am trying to get a list of IDs from a source while excluding a few select subjects.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;so here are the queries:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;select count(distinct idvar) from db1.source;&lt;/FONT&gt;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT face="andale mono,times" size="5"&gt;144252637&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;select count(distinct idvar) from db1.source where idvar in (select distinct idvar from db2.list);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&lt;SPAN&gt;710610&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&lt;SPAN&gt;select count(distinct idvar) from db1.source where idvar not&amp;nbsp;in (select distinct idvar from db2.list);&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&lt;SPAN&gt;&lt;SPAN&gt;Please excuse the formatting in this message.&amp;nbsp; Something is not adding up here.&amp;nbsp; I would like some help please.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="5"&gt;&lt;SPAN&gt;&lt;SPAN&gt;Thanks!&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:02:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/select-not-in-not-working/m-p/65783#M13554</guid>
      <dc:creator>huehuehue</dc:creator>
      <dc:date>2022-09-16T13:02:03Z</dc:date>
    </item>
    <item>
      <title>Re: select not in not working</title>
      <link>https://community.cloudera.com/t5/Support-Questions/select-not-in-not-working/m-p/65784#M13555</link>
      <description>&lt;P&gt;Are there any NULLs in idvar? If so, you could be getting tripped up by the interaction between NOT IN and NULL values. One interesting quirk of SQL is that in some cases IN and NOT IN can both be false for the same row and subquery. E.g. I can recreate a similar scenario if the only value in the subquery is a NULL.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny;
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where int_col in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where int_col not in (select distinct int_col from functional.alltypesagg where int_col is null);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+&lt;/PRE&gt;&lt;P&gt;I suspect it might be easier to understand if you use a NOT EXISTS. It is almost equivalent to NOT IN but the handling of NULL values is more intuitive.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where not exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 2                       |
+-------------------------+
[localhost:21000] &amp;gt; select count(distinct int_col) from functional.alltypestiny t1 where exists(select distinct int_col from functional.alltypesagg t2 where int_col is null and t1.int_col = t2.int_col);
+-------------------------+
| count(distinct int_col) |
+-------------------------+
| 0                       |
+-------------------------+&lt;/PRE&gt;</description>
      <pubDate>Mon, 26 Mar 2018 21:47:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/select-not-in-not-working/m-p/65784#M13555</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2018-03-26T21:47:35Z</dc:date>
    </item>
  </channel>
</rss>

