Created on 10-26-2017 08:41 AM - edited 09-16-2022 05:27 AM
Hi All,
I am able to run the below query in hive but not in impala.
I am using CDH 5.5.4
Is it possible in impala or not?
QUERY:
Select count(distinct(concat(c1,c2))) as Key, sum(distinct(c3)) as Val
FROM test;
In HIve it is successfully executed but in impala i am getting the below error.
AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (concat(c1,c2))); deviating function: sum(DISTINCT (c3))
Any suggesstions on how to do it in a better way if not possible in impala will be helpful.
thanks
Nisith
Created 10-26-2017 10:15 AM
That currently is not possible, sorry.
Please see the corresponding JIRA for workarounds:
https://issues.apache.org/jira/browse/IMPALA-110
In particular, using NDV() iinstead of COUNT(DISTINCT) is much faster, but returns approximate results only.
Created on 10-27-2017 02:04 AM - edited 10-27-2017 02:55 AM
Hi Alex,
Thanks for your reply.
The jira link you have shared seems to be a pretty old one and there are some comments saying this may be available in impala 2.2 onwards but as per my information cdh 5.5.4 has impala 2.3.
it seems this issue was raised 4 years back but still not been implented yet as we are using microstaregy as our bI tool, so doing cross joins with the large tables are not a feasible solution in it.
Is this a fundamental constaraint in impala or it may come in a later impala version?
it will be great if you can give an hint in which impala version or cdh version this functionality may be valuable.
it will be helpfull for sure to the future users.
Nisith
Created 10-27-2017 08:55 PM
Please also see my response on the JIRA. In addition:
The fact that the JIRA is old has no bearing on priority. We prioritize according to the needs of our users. We have limited resources and make difficult choices on what to work on.