08-09-2016 07:59 PM
I have an Impala query like this:
Select col_1, sum(col_2) from my_table where col_1 ('a', 'b') group by col_a;
The result shows values for 'a' and 'b' but also for 'NULL'. Why would this be?
08-09-2016 09:30 PM
Are you sure that's the exact query you ran? It looks semantically invalid because col_1 is not part of the group by clause. Can you please paste the exact query just to be sure there are no misunderstandings? Thanks!
08-09-2016 09:34 PM - edited 08-09-2016 09:36 PM
Sorry - made a typo when I sanitised the query. Would be:
Select col_1, sum(col_2) from my_table where col_1 in ('a', 'b') group by col_1;
Result looks something like
a 17 b 59 NULL 426
08-10-2016 06:37 PM
The query was heavily sanitised for public consumption.
I will see what I can do in terms of putting together a reproduceable example. Will make it easier to raise a support case anyway.
08-10-2016 06:39 PM
We take this issue seriously since it is a correctness problem. Any additional non-sensitive info you can provide would be helpful, like the column types, table format, etc. Thanks for helping!
08-22-2016 01:25 AM
I have tried and been unable to reproduce the problem from scratch with a trivial table that matches the sanitised structure I have posted here on a quickstart VM.
I will get someone to spend a bit more time on this to see what they can reproduce on the cluster that experiences the issue.