Created 10-05-2016 08:09 PM
HI,
I have 5 records in my database and i want to select the 3rd records.Minus is not possible in hive.So i'm trying Left outer join to perform the query.It's giving me some random results.
Id,Codes
1 100 1 200 2 200 3 200 3 300
select a.id,b.id from analytical a inner join (select id from analytical order by id desc limit 2) b on a.id=b.id where b.id is null order by a.id asc limit 3
OUtput:
id,codes
1 NULL 1 NULL 2 NULL
ANy suggestions?
Created 10-07-2016 06:25 PM
To re-iterate what @mqureshi already noticed, your query does not seem functionally correct. 1=1 is true, but null=null is false. Different story.
If you use LIMIT row_count with ORDER BY, Hive, like MySQL and many other SQL-like engines, ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
ROW_NUMBER function and select third row is what you need.
Your idea of inner join will not scale for many records. If you have duplicates then write your query to eliminate the duplicate or deal with that, but one would still wonder how would you determine the true third row when you have a duplicate? I don't see anything in the query you wrote dealing with that problem. I think there is still a lot of work to do on paper before even writing SQL.
Created 10-05-2016 08:42 PM
Two things here.
1. I don't understand your use of "a.id = b.id where b.id is null". When b.id is null, a.id and b.id will never be equal. However, it's your query and you probably know more about it, so you can ignore my comment if you know what I am talking about.
2. I think you need to use ROW_NUMBER function and then select the third row. This link describes usage of ROW_NUMBER() for SQL 2005 but it's the same for Hive.
Created 10-06-2016 05:18 AM
1. i think An inner join between the same table twice with limit 1 will solve this issue .
2 . I cannot go with rownumber as you can see i have duplicate column values which will not give me correct results
Created 10-06-2016 10:47 AM
please use below query
with t as (
select row_number() over (order by Codes) as RowNum, ID,Codes
)
select * from t
where RowNum=3;
Created 10-06-2016 07:06 PM
@prasoon kumar with Doesn't work in hive
Created 10-07-2016 05:26 AM
Hi Vaibhav,
Which version of hive you are working with?
Please check below link
https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression
Created 10-07-2016 06:25 PM
To re-iterate what @mqureshi already noticed, your query does not seem functionally correct. 1=1 is true, but null=null is false. Different story.
If you use LIMIT row_count with ORDER BY, Hive, like MySQL and many other SQL-like engines, ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
ROW_NUMBER function and select third row is what you need.
Your idea of inner join will not scale for many records. If you have duplicates then write your query to eliminate the duplicate or deal with that, but one would still wonder how would you determine the true third row when you have a duplicate? I don't see anything in the query you wrote dealing with that problem. I think there is still a lot of work to do on paper before even writing SQL.
Created 10-07-2016 07:24 PM
@Constantin Stanca yes it will not work in case in duplicates but in case of distinct values the below worked:
select a.id,b.id from analytical a inner join (select id from analytical order by id desc limit 3) b on a.id=b.id ORDER BY A.ID ASC LIMIT 1
Created 10-08-2016 01:07 AM
No question that this could work, just a bit concerned with performance. Order by is a very expensive operation involving a single reducer. Practically, you do one full table scan for that and once you get those three records you do another table scan to get the attributes of those records.
Look at an alternative here: https://community.hortonworks.com/questions/24667/hive-top-n-records-within-a-group.html
If any response in this thread was helpful, don't forget to vote/accept the best response.
Created 10-08-2016 07:34 PM
I'm not sure about my data because here even composite keys can produce duplicates. so going by analytical function is not a good choice for me.Any how query is not taking that much time for me.
I voted up for your Solution .Thanks For your Response. 🙂 @Constantin Stanca