<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Count Distinct discrepancy --Hive in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Count-Distinct-discrepancy-Hive/m-p/225085#M186948</link>
    <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/77763-screen-shot-2018-06-19-at-104245-pm.png"&gt;screen-shot-2018-06-19-at-104245-pm.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/77763-screen-shot-2018-06-19-at-104245-pm.png"&gt;&lt;/A&gt;Count distinct doesn't always give me the right answer. I've attached two different queries that should both result in 7 unique items purchased. If I don't do an operation on mdse_item_i like cast it to a bigint, it doesn't always count them correctly.&lt;/P&gt;&lt;P&gt;to explain it simply, when i do cast on "mdse_it&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/77760-countsdistinct-1.txt"&gt;countsdistinct-1.txt&lt;/A&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/77761-doesnotcountdistinct.txt"&gt;doesnotcountdistinct.txt&lt;/A&gt;em_i" it gives unique results as 7, but when i don't do casting it gives unique results as 10 which is not correct.&lt;/P&gt;&lt;P&gt;hive&amp;gt; select * from dfr_distinct;
OK
100000000938 5&lt;STRONG&gt; 7&lt;/STRONG&gt; 12.33 2 2.75 4.27 8.060 2 8 0
Time taken: 0.479 seconds, Fetched: 1 row(s)
hive&amp;gt; select * from dfr_distinctnot;
OK
100000000938 5 &lt;STRONG&gt;10&lt;/STRONG&gt; 12.33 2 2.75 4.27 8.06 0 2 8 0
Time taken: 0.932 seconds, Fetched: 1 row(s)&lt;/P&gt;&lt;P&gt;tried running the query in both MR and Tez modes still giving same results when i don't do casting.&lt;/P&gt;</description>
    <pubDate>Wed, 20 Jun 2018 10:41:58 GMT</pubDate>
    <dc:creator>reddyr211</dc:creator>
    <dc:date>2018-06-20T10:41:58Z</dc:date>
  </channel>
</rss>

