Support Questions
Find answers, ask questions, and share your expertise

Non-equi join in hive to get dim data between fact table start and begin dates

SELECT dim.Date1, * from fact join Dim on Dim.Date1 between fact.Begin_date and fact.End_date

I have this sql query and I need to convert it to hive query. I know that non equi joins does not work in hive and tried this but it is not working.

SELECT dim.Date1, * from Dim,fact where dim.date1 between fact.Begin_date and fact.End_date

Any thoughts?

4 REPLIES 4

Super Guru

@ganesh kumar morla why are you not using between operator in hive? also for push down predicate for non equi join consider using cross join

@Sunile Manjee between operator in non equi joins is not working, can you write down a simple code in those lines so that I can understand better. Thanks for your time and help!

@ganesh kumar morla

Option 1:

As @Sunile Manjee also suggested, you could give cross-join a shot. Your query would look something like this:

SELECT dim.Date1, fact.* FROM fact join dim WHERE dim.Date1 >= fact.Begin_date and dim.Date1 <= fact.End_date;

This is a cartesian product and it could take serious resources. You need to understand the data in both tables and estimate some parameters that will help your query parallelism. Look here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/ref-ffe....

Option 2:

Before 0.10 when cross-join was introduced, I used a different workaround. Here is an example. I did not write it to your case exactly. Feel free to adapt to your needs. You could also create temporary tables. _serial and _inner tables below will be out of your dim table which supposedly is small when compared with fact table. You get the point.

1. Create a copy of the left table with added serial row numbers:

CREATE TABLE OrderLineItem_serial AS
SELECT ROW_NUMBER() OVER() AS serial, * FROM OrderLineItem; 

2. Do an inner join:

CREATE TABLE OrderLineItem_inner AS
SELECT * FROM OrderLineItem_serial li JOIN ProductPrice p
on p.ProductID = li.ProductID WHERE OrderDate BETWEEN startDate AND EndDate; 

3. Left join by serial:

SELECT * FROM OrderLineItem_serial li
LEFT OUTER JOIN OrderLineItem_inner i on li.serial = i.serial;

If you find your solution in responses provided, please vote and accept the best answer.

we have purchased presto for this and it works much faster

https://prestodb.io/docs/current/