Created on 02-13-2020 05:17 AM - last edited on 02-13-2020 07:34 PM by ask_bill_brooks
Hi,
I have 2 tables in Hive. Table-a has millions of rows and many columns. Table-b has just few rows with two columns (key as tablename, value as 1).
Now I do the following,
1) Cross join table a and b.
2) Based on table A's on of the date column, we need to pass the table b's column "value" as "1" or "2" and bring the table A's date column less than one or two days.
e.g.
select * from table-a, table-b where 1=1 and table-a.date_col <= from_unixtime(unix_timestamp()-(table-b.value*24*60*60))) - while the table-b.value is 1 then it subtracts one day or any number of hours.
Basically, we need to pass the table-b value as parameter according to our requirements. This cross join works for smaller tables. When the data of table-a is huge, it creates performance or memory issue.
Please advise on any solution to fine tune this approach or achieve the result using any alternate approach and overcome this issue in HIVE.
Thanks.
Created 02-13-2020 06:02 PM
Created 02-13-2020 11:38 PM
The table-b has only one row with a value 1. My intention is to pass this value in TABLE-b.value, and subtract the number of days in the query. If I pass 1 then it should subtract one day from the TABLE-a.date_col. If I pass 2 then it should give me records that are 2 days lesser than TABLE-a.date_col.
SELECT *
FROM TABLE-a,
TABLE-b
WHERE 1=1
AND TABLE-a.date_col <= from_unixtime(unix_timestamp()-(TABLE-b.value*24*60*60)))
Created 02-18-2020 03:54 PM