I have an Impala query like this:
Select col_1, sum(col_2) from my_table where col_1 ('a', 'b') group by col_a;
The result shows values for 'a' and 'b' but also for 'NULL'. Why would this be?
Are you sure that's the exact query you ran? It looks semantically invalid because col_1 is not part of the group by clause. Can you please paste the exact query just to be sure there are no misunderstandings? Thanks!
Sorry - made a typo when I sanitised the query. Would be:
Select col_1, sum(col_2) from my_table where col_1 in ('a', 'b') group by col_1;
Result looks something like
a 17 b 59 NULL 426
The query was heavily sanitised for public consumption.
I will see what I can do in terms of putting together a reproduceable example. Will make it easier to raise a support case anyway.
We take this issue seriously since it is a correctness problem. Any additional non-sensitive info you can provide would be helpful, like the column types, table format, etc. Thanks for helping!
I have tried and been unable to reproduce the problem from scratch with a trivial table that matches the sanitised structure I have posted here on a quickstart VM.
I will get someone to spend a bit more time on this to see what they can reproduce on the cluster that experiences the issue.