New Contributor
Posts: 2
Registered: ‎06-17-2014

How to group by ONE column without using aggregate functions?

[ Edited ]

Hello to everybody! I am wondering how to do a classic grouping like "select email,product,lastname from tablename group by email"

It seems I have to choose either all fields (no grouping at all then) or use aggregate functions like max... ( which will lead to wrong results)


Any answer is appreciated!

Thanks in Advance


Oh, almost found one possible way:

select email,product,lastname from (select email, min(lastname) as lastname from tablename group by email) tmptable inner join tablename on and tmptable.product = tablename.product)


But I hope there are smarter ways...

Cloudera Employee
Posts: 40
Registered: ‎08-15-2013

Re: How to group by ONE column without using aggregate functions?

Hi - 


You're right - you do have to mention every non-aggregation column in a GROUP BY. When you don't have any aggregations, GROUP BY isn't so useful because the most common use of the expression is to compute summaries over groups of rows.


I'm not 100% sure what you're trying to achieve, but if you just want your output to group all rows with the same e-mail together, you can use ORDER BY for this purpose:


SELECT email, product, lastname FROM tablename ORDER BY email LIMIT 10000;


The LIMIT clause is necessary for now because Impala 1.3.1 does not support external sort. The next release of Impala, v1.4, will include an external sort operator, and the LIMIT clause can be removed. For now, you should change the LIMIT value to the smallest value that is suitable for your requirements.




New Contributor
Posts: 2
Registered: ‎06-17-2014

Re: How to group by ONE column without using aggregate functions?

Hello Henry and thanks for your answer.

What I want to achieve is basically to remove duplicates of email (the real table is a bit more complex than my exampe).


Let's say I have a huge amount of emails, all belonging to different products, but now I watn to use them only distinct by email, no matter of which product they were chosen. In Mysql that's just select blah group by email and I have what I want.


Yes I know, the table structure COULD be one for emails and then 1/n relation tables, but the structure is like it is for other reasons...


As I want to use insert ... select for that operation, there is no easy other way in getting rid of that duplicates, I think...

So my first solution seems to be the best so far, right?