Support Questions

Find answers, ask questions, and share your expertise

hive query to check for entries added in a particular month that do not exist already in the database

avatar
Expert Contributor

I need to check for new emailId's added in 06-2016 that have not existed in database ever before that.

I wrote the query using NOT IN operator:

select DISTINCT SF.customer_email
FROM 
Magento.sales_flat_order SF
WHERE
YEAR(TO_DATE(SF.created_at)) = '2016'
AND 
MONTH(TO_DATE(SF.created_at)) = '6'
AND 
SF.customer_email 
NOT IN (
select SFO.customer_email FROM Magento.Sales_flat_order SFO 
WHERE
TO_DATE(SFO.created_at) <= '2016-05-31'
)

I have verified that there are new emaiid's in the time but it returns empty result-set. Why is that? Infact, When I replace NOT IN operator with IN operator, it does return me the common ones but somehow NOT IN is behaving erratically.

Is there an alternative way I can do it?

1 ACCEPTED SOLUTION

avatar

Hi @Simran Kaur. I added an answer to your other version of this question found here (https://community.hortonworks.com/questions/43885/not-in-operator-in-hive-with-a-subquery.html#answer-43944)

One other thing I like to look for is leading or trailing blanks in the comparison field (email address). But in this instance it should not cause a problem since you are using the same table (and the same email addresses) in the inner and outer query.

View solution in original post

1 REPLY 1

avatar

Hi @Simran Kaur. I added an answer to your other version of this question found here (https://community.hortonworks.com/questions/43885/not-in-operator-in-hive-with-a-subquery.html#answer-43944)

One other thing I like to look for is leading or trailing blanks in the comparison field (email address). But in this instance it should not cause a problem since you are using the same table (and the same email addresses) in the inner and outer query.