Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive QL - Aggregating within a group

avatar
New Member

Hi there,

I am new to the Hive QL language and am trying to solve the following problem.

I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below:

stackdata_clean.owneruseridstackdata_clean.score
15
26
35
14
24

I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it:

stackdata_clean.owneruserid

stackdata_clean.score

210
19
35

My table name is stackdata_clean and the code I am trying to use is:

SELECT stackdata_clean.owneruserid,
SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid)
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sum(stackdata_clean.score)DESC LIMIT 10;

I am being returned the following error:

Error while compiling statement: FAILED: 
SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for 
circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: 
Line 2:20 Invalid column reference 'score' [ERROR_STATUS]

Can anyone help solve this problem?

Any help is greatly appreciated!

Thanks in advance 🙂

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You're doing a window function and a group by on the same column, and that seems to be your error.

Try this:

SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

You're doing a window function and a group by on the same column, and that seems to be your error.

Try this:

SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;

avatar
New Member

That worked - thanks a lot!