Created 03-28-2016 01:10 AM
Hi all,
I have a table with the fields user_id and value and I want to order the values in descending order within each user_id and then only emit the top 100 records for each user_id. This is the code I am attempting to use:
DROP TABLE IF EXISTS mytable2 CREATE TABLE mytable2 AS SELECT * FROM (SELECT *, rank (user_id) as rank FROM (SELECT * from mytable DISTRIBUTE BY user_id SORT BY value DESC)a )b WHERE rank<101 ORDER BY rank;
However when I run this query, I get the following error:
Error while compiling statement: FAILED: SemanticException [Error 10247]: Missing over clause for function : rank [ERROR_STATUS]
Can anyone help?
Thanks in advance.
Created 03-28-2016 11:28 PM
Hello Maeve,
you did go a bit too deep. You can use the Rank clause in combination with an over statement that describes how Hive should rank the columns. Otherwise how would he know by what to rank the users. The partition by clause also tells hive to distribute by userid and to sort inside a userid without you needing to specify it specifically. Below is what you want right?
select * from ( select user_id, value, desc, rank() over ( partition by user_id order by value desc) as rank from test4 ) t where rank < 3;
The output looks like this:
OK 1 2 hallo 1 1 1 hallo 2 2 11 hallo4 1 2 10 hallo3 2
The input table looks like this:
hive> select * from test4; OK USER_ID VALUE DESC 1 1 hallo 1 2 hallo 2 3 hallo1 2 7 hallo1 2 10 hallo3 2 11 hallo4
Created 03-28-2016 11:28 PM
Hello Maeve,
you did go a bit too deep. You can use the Rank clause in combination with an over statement that describes how Hive should rank the columns. Otherwise how would he know by what to rank the users. The partition by clause also tells hive to distribute by userid and to sort inside a userid without you needing to specify it specifically. Below is what you want right?
select * from ( select user_id, value, desc, rank() over ( partition by user_id order by value desc) as rank from test4 ) t where rank < 3;
The output looks like this:
OK 1 2 hallo 1 1 1 hallo 2 2 11 hallo4 1 2 10 hallo3 2
The input table looks like this:
hive> select * from test4; OK USER_ID VALUE DESC 1 1 hallo 1 2 hallo 2 3 hallo1 2 7 hallo1 2 10 hallo3 2 11 hallo4
Created 03-29-2016 09:21 AM
Thanks a lot Benjamin - I did realise after posting the above that I needed a UDF to use with the rank function on its own. It's working now so thank you.