Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive QL - Aggregating within a group

avatar
Contributor

Hi there,

I am new to the Hive QL language and am trying to solve the following problem.

I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below:

stackdata_clean.owneruseridstackdata_clean.score
15
26
35
14
24

I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it:

stackdata_clean.owneruserid

stackdata_clean.score

210
19
35

My table name is stackdata_clean and the code I am trying to use is:

SELECT stackdata_clean.owneruserid,
SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid)
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sum(stackdata_clean.score)DESC LIMIT 10;

I am being returned the following error:

Error while compiling statement: FAILED: 
SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for 
circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: 
Line 2:20 Invalid column reference 'score' [ERROR_STATUS]

Can anyone help solve this problem?

Any help is greatly appreciated!

Thanks in advance 🙂

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You're doing a window function and a group by on the same column, and that seems to be your error.

Try this:

SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

You're doing a window function and a group by on the same column, and that seems to be your error.

Try this:

SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;

avatar
Contributor

That worked - thanks a lot!