<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive 3.1.4 Multiple rows from COUNT(*) query in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-3-1-4-Multiple-rows-from-COUNT-query/m-p/313875#M225777</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;on Hive 3.1.4 we have a COUNT(*) query which returns more than one rows, instead of exactly one.&lt;/P&gt;&lt;P&gt;I created this test table:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier,monospace" size="3"&gt;CREATE TABLE test_case (&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace" size="3"&gt;cod_pers STRING,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace" size="3"&gt;cod_address STRING,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace" size="3"&gt;PRIMARY KEY (cod_pers) DISABLE NOVALIDATE )&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace" size="3"&gt;PARTITIONED BY (num_snapshot BIGINT) ;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;and populated it with more than 500,000 rows in 3 partitions.&lt;/P&gt;&lt;P&gt;To find how many people changed address, I wrote this query:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier,monospace"&gt;WITH data1 AS ( -- eliminate duplicates&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;select t.cod_pers, t.cod_address, count(*) as num_address&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;from test_case as t&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;group by t.cod_pers, t.cod_address&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;), data2 AS ( -- find changes per person&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;select s.cod_pers, count(*) as num_changes&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;from data1 s&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;group by s.cod_pers&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;having count(*)&amp;gt;1 &lt;/FONT&gt;&lt;FONT face="courier new,courier,monospace"&gt;)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;select count(*) as num_all_changes from data2 as gg ;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Instead of obtaining a single row with the total number, the query returns 4 rows :&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier,monospace"&gt;+------------------+&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;| num_all_changes |&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;+------------------+&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;| 63 |&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;| 58 |&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;| 64 |&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;| 59 |&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;+------------------+&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier,monospace"&gt;4 rows selected (1.252 seconds)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;4 is also the number of reducers used by the query.&lt;/P&gt;&lt;P&gt;If I add a "LIMIT 1" clause, the query works as expected, and a final reducer is added in the query execution. The same happens adding "GROUP BY 1" clause.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to COMPACT the table and have the statistics recalculated, but nothing changed.&lt;/P&gt;&lt;P&gt;I know I can rewrite the query, but I'm not looking for a work-aroud: I wonder if it's a known bug of CBO and if someone else experimented the same behavior (and also if a patch is available).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
    <pubDate>Wed, 31 Mar 2021 08:17:55 GMT</pubDate>
    <dc:creator>giovannimori</dc:creator>
    <dc:date>2021-03-31T08:17:55Z</dc:date>
  </channel>
</rss>

