- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to find nth highest salary in hive.
- Labels:
-
Apache Hive
Created ‎12-22-2015 11:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
I know hive is Data Warehousing tool which is not suited as OLTP. In regular RDBMS Structure i have way like "Correlated queries" helps to find nth highest salary from a table as following.
Is there any way in hive also.The query is related to true/false conditions.
SELECT * /*This is the outer query part */ FROM Employee Emp1 WHERE (N-1) = ( /* Subquery starts here */ SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary)
Created ‎12-22-2015 12:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
. @Suresh Bonam yes you can calculate the nth element, see the following example:
Create sample table (emp=employees)
create table emp(id Int, salary Double) row format delimited fields terminated by ',' stored as textfile
I have added the following data to that table:
1,423 2,23 3,67878 4,84 5,9999999 6,1 7,578 8,8877 9,4445 10,756 11,84
Understanding the data
Lets look at the sorted salaries first:
select * from emp order by salary;
Result:
emp.id emp.salary 6 1 2 23 4 84 11 84 1 423 7 578 10 756 9 4445 8 8877 3 67878 5 9999999
Note: We have two identical salaries in the table (=84)
Create query to select unique salaries
In order to select the nth-element we need to remove the duplicate values (e.g. distinct) and create some kind of row counter or index. I decided to use Group-By to remove the duplicate salaries and row_number-UDF to generate the index.
select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary
Result:
salary row_no 1 1 23 2 84 3 423 4 578 5 756 6 4445 7 8877 8 67878 9 9999999 10
Putting everything together
select * from ( select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary) res where res.row_no = 4
Result:
res.salary res.row_no 423 4
Note: We received the correct result and not 84 since we've removed the duplicate salaries in the subquery 🙂
Created ‎12-22-2015 12:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
. @Suresh Bonam yes you can calculate the nth element, see the following example:
Create sample table (emp=employees)
create table emp(id Int, salary Double) row format delimited fields terminated by ',' stored as textfile
I have added the following data to that table:
1,423 2,23 3,67878 4,84 5,9999999 6,1 7,578 8,8877 9,4445 10,756 11,84
Understanding the data
Lets look at the sorted salaries first:
select * from emp order by salary;
Result:
emp.id emp.salary 6 1 2 23 4 84 11 84 1 423 7 578 10 756 9 4445 8 8877 3 67878 5 9999999
Note: We have two identical salaries in the table (=84)
Create query to select unique salaries
In order to select the nth-element we need to remove the duplicate values (e.g. distinct) and create some kind of row counter or index. I decided to use Group-By to remove the duplicate salaries and row_number-UDF to generate the index.
select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary
Result:
salary row_no 1 1 23 2 84 3 423 4 578 5 756 6 4445 7 8877 8 67878 9 9999999 10
Putting everything together
select * from ( select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary) res where res.row_no = 4
Result:
res.salary res.row_no 423 4
Note: We received the correct result and not 84 since we've removed the duplicate salaries in the subquery 🙂
Created ‎12-22-2015 02:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Awesome! Marked as favorite post now. 🙂
Created ‎12-22-2015 05:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks 🙂
Created ‎02-13-2020 11:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ho to find 2nd(nth) highest salary in each department? Please find below sample dataset.
1,423,depA
2,23,depA
3,67878,depB
11,84,depB
5,9999999,depA
9,4445,depA
8,8877,depB
10,756,depC
4,84,depC
7,578,depA
6,1,depC
output-
10,756,depC
4,84,depC
6,1,depC
Created ‎02-13-2020 04:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @LetsHadoop,
As this thread is older and was marked 'Solved' in Dec 2015 you would have a better chance of receiving a helpful answer by starting a new thread. This will also provide you with an opportunity to provide details about your table design that could aid others in providing an answer more directly tailored to your question.
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created ‎03-29-2016 12:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi i tried this but got the below error. pls do help me out. I am quite new to hive.
hive> select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary; FAILED: ParseException line 1:33 mismatched input '(' expecting FROM near 'over' in from clause
Created ‎05-14-2019 07:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use DENSE_RANK instead of ROW_NUMBER() with GROUP BY , as it will take only 1 map-reduce job to achieve it as compared to 2 map-reduce jobs by latter one.
select * from (
select salary, DENSE_RANK() over (ORDER BY salary DESC) as row_no from emp) res
where res.row_no = 2;
Created ‎06-23-2020 06:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please watch the below video:-
https://www.youtube.com/watch?v=fBeN8VQm0_Q
Please like comment share and subscribe my channel, If you loved it.
