I am running the following query on the same data (same tables, and the same number of records on those tables), but it gives a different result on Impala than what I get on SQL Server.
The following on Impala:
and this is on SQL Server:
I am very sure that the data is the same in everything ... actually, I have imported the data from SQL Server through Sqoop and after that made sure that the number of records is the same in the source and destination .... yet I don't know why I'm getting a defferent result here and there ?!!!
Can you bring us this queries results in SQL server and Impala:
Select avg(tagno) from tag; Select avg(tagno) from has_tag; Select count(*) from tag where tagno is null; Select count(*) from has_tag where tagno is null;
Also try to add this on Impala query, and let us know if there is any change:
... Inner join has_tags hit on (s.tagno = hit.tagno and s.categorycode = hit.categorycode) ...
It'd be helpful to post your impala version too.
It seems unlikely that either SQL engine would return incorrect results on a straightforward query like this.
I'd suggest looking at a subset of the data and breaking down the query until you can see where the different lies, e.g.
select * FROM tag s INNER JOIN has_tags ht on S.TagNo = HT.TagNo and S.CategoryCode = HT.CategoryCode WHERE ht.categorycode = 'SYS'