Reply
New Contributor
Posts: 4
Registered: ‎02-02-2016

can i do a count and distinct on 2 different columns in a single select statement in Impala

[ Edited ]

Hi All,

 

I am able to run the below query in hive but not in impala.

I am using CDH 5.5.4

Is it possible in impala or not?

 

QUERY: 

Select count(distinct(concat(c1,c2))) as Key, sum(distinct(c3)) as Val

FROM test;

 

In HIve it is successfully executed but in impala i am getting the below error.

 

AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (concat(c1,c2))); deviating function: sum(DISTINCT (c3))

 

Any suggesstions on how to do it in a better way if not possible in impala will be helpful.

 

thanks

Nisith

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: can i do a count and distinct on 2 different columns in a single select statement in Impala

That currently is not possible, sorry.

 

Please see the corresponding JIRA for workarounds:

https://issues.apache.org/jira/browse/IMPALA-110

 

In particular, using NDV() iinstead of COUNT(DISTINCT) is much faster, but returns approximate results only.

New Contributor
Posts: 4
Registered: ‎02-02-2016

Re: can i do a count and distinct on 2 different columns in a single select statement in Impala

[ Edited ]

Hi Alex,

 

Thanks for your reply.

The jira link you have shared seems to be a pretty old one and there are some comments saying this may be available in impala 2.2 onwards but as per my information cdh 5.5.4 has impala 2.3.

 

it seems this issue was raised 4 years back but still not been implented yet as we are using microstaregy as our bI tool, so doing cross joins with the large tables are not a feasible solution in it.

 

Is this a fundamental constaraint in impala or it may come in a later impala version?

it will be great if you can give an hint in which impala version or cdh version this functionality may be valuable.

 

it will be helpfull for sure to the future users.

 

Nisith

Highlighted
Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: can i do a count and distinct on 2 different columns in a single select statement in Impala

Please also see my response on the JIRA. In addition:

 

The fact that the JIRA is old has no bearing on priority. We prioritize according to the needs of our users. We have limited resources and make difficult choices on what to work on.