Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to find nth highest salary in hive.

Solved Go to solution

How to find nth highest salary in hive.

Rising Star

Hi guys,

I know hive is Data Warehousing tool which is not suited as OLTP. In regular RDBMS Structure i have way like "Correlated queries" helps to find nth highest salary from a table as following.

Is there any way in hive also.The query is related to true/false conditions.

SELECT * /*This is the outer query part */
FROM Employee Emp1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to find nth highest salary in hive.

. @Suresh Bonam yes you can calculate the nth element, see the following example:

Create sample table (emp=employees)

create table emp(id Int, salary Double)
row format delimited 
  fields terminated by ',' 
  stored as textfile

I have added the following data to that table:

1,423
2,23
3,67878
4,84
5,9999999
6,1
7,578
8,8877
9,4445
10,756
11,84

Understanding the data

Lets look at the sorted salaries first:

select * from emp order by salary;

Result:

emp.id	emp.salary
6	1
2	23
4	84
11	84
1	423
7	578
10	756
9	4445
8	8877
3	67878
5	9999999

Note: We have two identical salaries in the table (=84)

Create query to select unique salaries

In order to select the nth-element we need to remove the duplicate values (e.g. distinct) and create some kind of row counter or index. I decided to use Group-By to remove the duplicate salaries and row_number-UDF to generate the index.

select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary

Result:

salary	row_no
1	1
23	2
84	3
423	4
578	5
756	6
4445	7
8877	8
67878	9
9999999	10

Putting everything together

select * from (
  select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary) res 
where res.row_no = 4

Result:

res.salary    res.row_no
423	      4

Note: We received the correct result and not 84 since we've removed the duplicate salaries in the subquery :)

5 REPLIES 5

Re: How to find nth highest salary in hive.

. @Suresh Bonam yes you can calculate the nth element, see the following example:

Create sample table (emp=employees)

create table emp(id Int, salary Double)
row format delimited 
  fields terminated by ',' 
  stored as textfile

I have added the following data to that table:

1,423
2,23
3,67878
4,84
5,9999999
6,1
7,578
8,8877
9,4445
10,756
11,84

Understanding the data

Lets look at the sorted salaries first:

select * from emp order by salary;

Result:

emp.id	emp.salary
6	1
2	23
4	84
11	84
1	423
7	578
10	756
9	4445
8	8877
3	67878
5	9999999

Note: We have two identical salaries in the table (=84)

Create query to select unique salaries

In order to select the nth-element we need to remove the duplicate values (e.g. distinct) and create some kind of row counter or index. I decided to use Group-By to remove the duplicate salaries and row_number-UDF to generate the index.

select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary

Result:

salary	row_no
1	1
23	2
84	3
423	4
578	5
756	6
4445	7
8877	8
67878	9
9999999	10

Putting everything together

select * from (
  select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary) res 
where res.row_no = 4

Result:

res.salary    res.row_no
423	      4

Note: We received the correct result and not 84 since we've removed the duplicate salaries in the subquery :)

Re: How to find nth highest salary in hive.

Awesome! Marked as favorite post now. :-)

Re: How to find nth highest salary in hive.

Thanks :)

Re: How to find nth highest salary in hive.

New Contributor

hi i tried this but got the below error. pls do help me out. I am quite new to hive.

hive> select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp group by salary; FAILED: ParseException line 1:33 mismatched input '(' expecting FROM near 'over' in from clause

Re: How to find nth highest salary in hive.

New Contributor

Use DENSE_RANK instead of ROW_NUMBER() with GROUP BY , as it will take only 1 map-reduce job to achieve it as compared to 2 map-reduce jobs by latter one.


select * from (

select salary, DENSE_RANK() over (ORDER BY salary DESC) as row_no from emp) res

where res.row_no = 2;

Don't have an account?
Coming from Hortonworks? Activate your account here