<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Null check query in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367167#M239821</link>
    <description>&lt;P&gt;Thanks so much,&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;! Your query worked amazingly efficient!&lt;/P&gt;</description>
    <pubDate>Wed, 29 Mar 2023 15:41:12 GMT</pubDate>
    <dc:creator>Supernova</dc:creator>
    <dc:date>2023-03-29T15:41:12Z</dc:date>
    <item>
      <title>Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367012#M239750</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I'm trying to write a data quality query that is efficient in counting the number of nulls in particular columns. Currently the query that I use is as below(I actually found a related example here on a related topic) :&lt;/P&gt;&lt;P&gt;SELECT CustFirstName, CustLastName&lt;/P&gt;&lt;P&gt;,CASE WHEN CustFirstName is null then 1 else 0 end CustFirstNameNullCheck&lt;/P&gt;&lt;P&gt;,CASE WHEN CustLastName is null then 1 else 0 end CustLastNameNullCheck&lt;/P&gt;&lt;P&gt;FROM SchemaName.DbName&lt;/P&gt;&lt;P&gt;WHERE Date = '2023-03-15'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output that I get is:&lt;/P&gt;&lt;TABLE border="1" width="99.7159090909091%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="25%"&gt;CustFirstName&lt;/TD&gt;&lt;TD width="25%"&gt;CustLastName&lt;/TD&gt;&lt;TD width="25%"&gt;CustFirstNameNullCheck&lt;/TD&gt;&lt;TD width="25%"&gt;CustLastNameNullCheck&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%"&gt;{null}&lt;/TD&gt;&lt;TD width="25%"&gt;{null}&lt;/TD&gt;&lt;TD width="25%"&gt;1&lt;/TD&gt;&lt;TD width="25%"&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then &lt;U&gt;hundreds of thousands&lt;/U&gt; of single rows of {nulls} that follow(I know, it's a long story).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyhow, the output that I would LIKE to get, please, is a much simpler layout with just the column(or an alias) and a count of the number of total nulls:&lt;/P&gt;&lt;TABLE border="1" width="99.7159090909091%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="50%" height="30px"&gt;CustFirstName&lt;/TD&gt;&lt;TD width="50%" height="30px"&gt;CustLastName&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="50%" height="30px"&gt;500,000 (nulls)&lt;/TD&gt;&lt;TD width="50%" height="30px"&gt;500,000 (nulls)&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've tried different iterations of SELECT COUNT(*) and just could not get the query right. MapR gives a gigantic error message when trying to use it. Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2023 15:34:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367012#M239750</guid>
      <dc:creator>Supernova</dc:creator>
      <dc:date>2023-03-27T15:34:26Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367019#M239756</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104244"&gt;@Supernova&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our Hive experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/70785"&gt;@Shmoo&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;&amp;nbsp; who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Mar 2023 17:53:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367019#M239756</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2023-03-27T17:53:28Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367091#M239781</link>
      <description>&lt;P&gt;Hive&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104244"&gt;@Supernova&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;There can be multiple solutions probabaly, for one example with a subquery you can use the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select concat(cast(sum(CustFirstNameNullCheck) as bigint),' (nulls)') as CustFirstName, concat(cast(sum(CustLastNameNullCheck) as bigint),' (nulls)') as CustLastName from (select CASE WHEN CustFirstName is null then 1 else 0 end CustFirstNameNullCheck, CASE WHEN CustLastName is null then 1 else 0 end CustLastNameNullCheck from SchemaName.DbName WHERE Date = '2023-03-15') a;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Hope this helps.&lt;/P&gt;&lt;P&gt;Best regards, Miklos&lt;/P&gt;</description>
      <pubDate>Tue, 28 Mar 2023 13:43:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367091#M239781</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2023-03-28T13:43:57Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367167#M239821</link>
      <description>&lt;P&gt;Thanks so much,&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/12885"&gt;@mszurap&lt;/a&gt;! Your query worked amazingly efficient!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 15:41:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367167#M239821</guid>
      <dc:creator>Supernova</dc:creator>
      <dc:date>2023-03-29T15:41:12Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367169#M239822</link>
      <description>&lt;P&gt;Great, glad to hear that it was helpful.&lt;/P&gt;&lt;P&gt;Actually I was thinking about using the NVL function, but in Hive that does not offer a value for the "else" part, like Impala's NVL2 funcion:&lt;/P&gt;&lt;P&gt;&lt;A href="https://impala.apache.org/docs/build/asf-site-html/topics/impala_conditional_functions.html#conditional_functions__nvl2" target="_blank" rel="noopener"&gt;https://impala.apache.org/docs/build/asf-site-html/topics/impala_conditional_functions.html#conditional_functions__nvl2&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;with that the query would be much simpler (no need for CASE WHEN ... THEN ... ELSE ... END), just a "NVL2(CustFirstName, 0, 1)&lt;/P&gt;&lt;PRE&gt;SELECT NVL2('ABC', 'Is Not Null', 'Is Null'); -- Returns 'Is Not Null'&lt;/PRE&gt;&lt;P&gt;Again, this is for Impala, Hive does not have this function unfortunately.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 15:49:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367169#M239822</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2023-03-29T15:49:25Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367171#M239824</link>
      <description>&lt;P&gt;Thanks for the info! I am educating myself and certainly appreciate it.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using NVL2, it looks like it either returns a 0 if a null is found, or the specified expression in the argument.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Even if I used Impala, NVL2 wouldn't work for me- as I need(and expect) the specific number &lt;EM&gt;count&lt;/EM&gt; of null records across given columns, right? Just wondering. Thanks again!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 16:12:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367171#M239824</guid>
      <dc:creator>Supernova</dc:creator>
      <dc:date>2023-03-29T16:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: Null check query</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367175#M239826</link>
      <description>&lt;P&gt;I assume you meant NVL function returns&amp;nbsp;&lt;SPAN&gt;0 or 1 if a null is found, or the specified expression in the argument - when using Hive.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;With Impala and NVL2 you would still need to have the outer query to "sum" up all the 1 values what we have mapped from the column value (from their real value to 0 or to 1). It would just be a bit nicer, but no real change.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2023 16:33:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Null-check-query/m-p/367175#M239826</guid>
      <dc:creator>mszurap</dc:creator>
      <dc:date>2023-03-29T16:33:42Z</dc:date>
    </item>
  </channel>
</rss>

