06-17-2014 03:07 AM - edited 06-17-2014 05:36 AM
Hello to everybody! I am wondering how to do a classic grouping like "select email,product,lastname from tablename group by email"
It seems I have to choose either all fields (no grouping at all then) or use aggregate functions like max... ( which will lead to wrong results)
Any answer is appreciated!
Thanks in Advance
Oh, almost found one possible way:
select email,product,lastname from (select email, min(lastname) as lastname from tablename group by email) tmptable inner join tablename on tmptable.email=tablename.email and tmptable.product = tablename.product)
But I hope there are smarter ways...
06-17-2014 10:35 AM
You're right - you do have to mention every non-aggregation column in a GROUP BY. When you don't have any aggregations, GROUP BY isn't so useful because the most common use of the expression is to compute summaries over groups of rows.
I'm not 100% sure what you're trying to achieve, but if you just want your output to group all rows with the same e-mail together, you can use ORDER BY for this purpose:
SELECT email, product, lastname FROM tablename ORDER BY email LIMIT 10000;
The LIMIT clause is necessary for now because Impala 1.3.1 does not support external sort. The next release of Impala, v1.4, will include an external sort operator, and the LIMIT clause can be removed. For now, you should change the LIMIT value to the smallest value that is suitable for your requirements.
06-17-2014 11:08 AM
Hello Henry and thanks for your answer.
What I want to achieve is basically to remove duplicates of email (the real table is a bit more complex than my exampe).
Let's say I have a huge amount of emails, all belonging to different products, but now I watn to use them only distinct by email, no matter of which product they were chosen. In Mysql that's just select blah group by email and I have what I want.
Yes I know, the table structure COULD be one for emails and then 1/n relation tables, but the structure is like it is for other reasons...
As I want to use insert ... select for that operation, there is no easy other way in getting rid of that duplicates, I think...
So my first solution seems to be the best so far, right?