Support Questions

Find answers, ask questions, and share your expertise

can i do a count and distinct on 2 different columns in a single select statement in Impala

avatar
Explorer

Hi All,

 

I am able to run the below query in hive but not in impala.

I am using CDH 5.5.4

Is it possible in impala or not?

 

QUERY: 

Select count(distinct(concat(c1,c2))) as Key, sum(distinct(c3)) as Val

FROM test;

 

In HIve it is successfully executed but in impala i am getting the below error.

 

AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (concat(c1,c2))); deviating function: sum(DISTINCT (c3))

 

Any suggesstions on how to do it in a better way if not possible in impala will be helpful.

 

thanks

Nisith

3 REPLIES 3

avatar

That currently is not possible, sorry.

 

Please see the corresponding JIRA for workarounds:

https://issues.apache.org/jira/browse/IMPALA-110

 

In particular, using NDV() iinstead of COUNT(DISTINCT) is much faster, but returns approximate results only.

avatar
Explorer

Hi Alex,

 

Thanks for your reply.

The jira link you have shared seems to be a pretty old one and there are some comments saying this may be available in impala 2.2 onwards but as per my information cdh 5.5.4 has impala 2.3.

 

it seems this issue was raised 4 years back but still not been implented yet as we are using microstaregy as our bI tool, so doing cross joins with the large tables are not a feasible solution in it.

 

Is this a fundamental constaraint in impala or it may come in a later impala version?

it will be great if you can give an hint in which impala version or cdh version this functionality may be valuable.

 

it will be helpfull for sure to the future users.

 

Nisith

avatar

Please also see my response on the JIRA. In addition:

 

The fact that the JIRA is old has no bearing on priority. We prioritize according to the needs of our users. We have limited resources and make difficult choices on what to work on.