I have a hive table that has 463M records. I am joining this table with a smaller one using the left join(left is the big table) When filtering using the where clause the it drops down to 4M which is wrong. When I use the AND instead of where I get 463M which is wrong. Is there a different way on doing it? The correct answer should be 171M.
here is my code:
drop table if exists step_1; CREATE TABLE STEP_1 AS SELECT t1.Sales_Account_ID_INT, t2.SALES_ACCOUNT_GROUP_NAME, t1.CR_Site_ID, t1.Technology, t1.VMS_TOP_CD, t1.Primary_SAV_Flag, t2.L1_SALES_TERRITORY_DESCR, t2.L2_SALES_TERRITORY_DESCR, t2.L3_SALES_TERRITORY_DESCR, t2.L4_SALES_TERRITORY_DESCR, t2.L5_SALES_TERRITORY_DESCR, t2.L6_SALES_TERRITORY_DESCR, t1.AdjWallet, cast(FY as int) AS FISCAL_YEAR,
(CASE WHEN t1.Bookings< 0 or t1.Bookings is null THEN 0 ELSE t1.Bookings END) AS Bookings FROM UNIF_WALL_SALESVIEW_Q4FY16 t1 LEFT JOIN SAV_ATTRIBUTES_Q4FY16 t2 ON (t1.Sales_Account_ID_INT = t2.SALES_ACCOUNT_ID_INT) (Here I tried the Where and AND) t1.Primary_SAV_Flag <> 'N' AND t1.Sales_Account_ID_INT IS NOT NULL AND t1.FY IN ( '2012', '2013', '2014', '2015', '2016' ) AND t1.Bookings <= t1.AdjWallet and t1.AdjWallet >=0 ORDER BY t1.Sales_Account_ID_INT, Fiscal_Year;