Created 03-24-2016 08:33 PM
Hi there,
I am new to the Hive QL language and am trying to solve the following problem.
I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below:
stackdata_clean.owneruserid | stackdata_clean.score |
1 | 5 |
2 | 6 |
3 | 5 |
1 | 4 |
2 | 4 |
I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it:
stackdata_clean.owneruserid | stackdata_clean.score |
2 | 10 |
1 | 9 |
3 | 5 |
My table name is stackdata_clean and the code I am trying to use is:
SELECT stackdata_clean.owneruserid, SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid) FROM stackdata_clean GROUP BY stackdata_clean.owneruserid ORDER BY sum(stackdata_clean.score)DESC LIMIT 10;
I am being returned the following error:
Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 2:20 Invalid column reference 'score' [ERROR_STATUS]
Can anyone help solve this problem?
Any help is greatly appreciated!
Thanks in advance 🙂
Created 03-25-2016 09:15 AM
You're doing a window function and a group by on the same column, and that seems to be your error.
Try this:
SELECT stackdata_clean.owneruserid, SUM(stackdata_clean.score) as sumscore FROM stackdata_clean GROUP BY stackdata_clean.owneruserid ORDER BY sumscore DESC LIMIT 10;
Created 03-25-2016 09:15 AM
You're doing a window function and a group by on the same column, and that seems to be your error.
Try this:
SELECT stackdata_clean.owneruserid, SUM(stackdata_clean.score) as sumscore FROM stackdata_clean GROUP BY stackdata_clean.owneruserid ORDER BY sumscore DESC LIMIT 10;
Created 03-26-2016 09:34 AM
That worked - thanks a lot!